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BRIEF ON APPEAL 



Sir: 



Further to the Notice of Appeal filed June 11, 2003, and received by the USPTO on June 13, 
2003, herewith are three copies of Appellants' Brief on Appeal. Authorized fees include the statutory 
fee of $1 10 for a one month extension of time, as well as the $ 320.00 fee for the filing of this Brief. 

This is an appeal from the decision of the Examiner finally rejecting claims 22-29 of the above- 
identified application. 



09/1A/S003 MiOHDAFl 00000021 090108 0903ttl4 
01 FCil402 320.00 DA 



(1 REAL PARTY IN INTEREST 



The above-identified application is assigned of record to Incyte Pharmaceuticals, Inc. (now 
Incyte Corporation, formeriy known as Incyte Genomics, Inc. (Reel 9863, Frame 0370), which is the 
real party in interest herein. 
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(2) RELATED APPEALS AND INTERFERENCES 
Appellants, their legal representative and the assignee are not aware of any related 
appeals or interferences which will directly affect or be directly affected by or have a bearing on 
the Board's decision in the instant appeal. 



f 3^ STATUS OF THE CLAIMS 



Claims rejected: Claims 22-29 



Claims allowed: 
Claims canceled: 



(none) 
Claims 1-21 



Claims withdrawn: Claims 30-34 

Claims on Appeal: Claims 22-29 (A copy of the claims on appeal, as amended, can be 

found in the attached Appendix). 



(4) STATUS OF AMENDMENTS AFTER FINAL 
There were no amendments made after final rejection. 



(5) SUMMARY OF THE INVENTION 
Embodiments of Appellants' invention are directed to an isolated nucleotide encoding a 
polypeptide, related to a human kinesin light chain homolog, abbreviated as KILCH. As 
described in the Specification at page 16, line 8 to 26: 

In one embodiment, the invention encompasses a polypeptide comprising the amino acid 
sequence of SEQ ID NO:l, as shown in Figures lA, IB, IC, ID, IE, IF, and IG. KILCH 
is 619 amino acids in length and has two potential cAMP- and cGMP-dependent protein 
kinase phosphorylation sites at S519 and S^^^', nine potential casein kinase n 
phosphorylation sites at Sig, S^^, T162, Sn4, S281, S4J6, T485, T518, and S^n; seven potential 
protein kinase C phosphorylation sites at S25, Sjqq, S245, Sjgp T^^, S493, and S526; and three 

kine sin light chain rep e a t^g n a t u res from D278 to Q319 , R 32Q t O-Qg^ i . ^nd R3 6 2 to K403, As — 

shown in Figure 2, KILCH has chemical and structural homology with human KLC (GI 
307085; SEQ ID N0:3). In particular, KILCH and human KLC share 66% identity. In 
addition, the region of KILCH from N77 to L153 shares 83% identity with the region of 
human KLC that contains 1 1 of the 15 heptad repeats. The region of KILCH from Q234 
to K403 shares 87% identity with the region of human KLC that contains four imperfect 
tandem repeats. Furthermore, the potential phosphorylation sites at Sjg, Sjoo, ^4X6^ T^^ee^ 
T4g5, and S493 of KILCH are conserved in human KLC. A region of unique sequence in 
KELCH from about amino acid 6 to about amino acid 17 is encoded by a fragment of 
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SEQ ID N0:2 corresponding to about nucleotide 184 to about nucleotide 219. Northern 
analysis shows the expression of this sequence in various libraries, at least 47% are 
associated with cancer and cell proliferation. In particular, 24% of the libraries 
expressing KILCH are derived from reproductive tissue, and 17% are derived from neural 
tissue. 

As such, the claimed invention has numerous practical, beneficial uses in toxicology testing, drug 
development and the diagnosis of disease (See the Specification, e.g., at, page 37, line 24 to page 
38, line 6; pages 39, line 10 to page 41, hne 23; page 42, lines 19-25 and page 54 ). 

(6) ISSUES 

Whether claims 22-29 directed to polynucleotide sequences that code for KILCH 
polypeptides meet the utility requirement of 35 U.S.C. §101. 

(7) GROUPING OF THE CLAIMS 
All of the claims on appeal are grouped together. 

(8^ APPELLANTS' ARGUMENTS 

Claims 22-29 have been rejected under 35 U.S.C. §101 because the claimed invention 
allegedly "is not supported by either a well-established utility or a disclosed specific and 
substantial credible utility." (3/11/03 Office Action, at page 3) 

The rejection of claims 22-29 is improper, as the inventions of those claims have a 
patentable utility as set forth in the instant specification, and/or a utility well known to one of 
ordinary skill in the art. 

The invention at issue is a polynucleotide sequence corresponding to a gene that is 
expressed in humans. The novel polynucleotide codes for a polypeptide demonstrated in the 
patent specification to be a member of the kinesin family, whose biological functions include the 
transport of membrane bound vesicles and organelles. More particularly, the polypeptide is a 
member of a class of kinesin light chain homologs, whose biological functions include the 
binding or specification of molecular cargo. [See the Specification at pages 2-3] As such, the 
claimed invention has numerous practical, beneficial uses in toxicology testing, drug 

113728 3 09/036,614 



Docket No.: PF-0484.1 CPA 

development, and the diagnosis of disease, none of which requires knowledge of how the 
polypeptide coded for by the polynucleotide actually functions. As a result of the benefits of 
these uses, the claimed invention already enjoys significant commercial success. 

Appellants have previously submitted the declaration of Bedilion (submitted with the 
office action response of 1 1/05/01) describing some of the practical uses of the claimed 
invention in gene and protein expression monitoring applications. The Bedilion declaration 
demonstrates that the positions and arguments made by the Patent Examiner with respect to the 
utility of the claimed polynucleotide are without merit. 

The Bedilion declaration describes, in particular, how the claimed expressed 

polynucleotide can be used in gene expression monitoring applications that were well-known at 

the time the patent application was filed, and how those applications are useful in developing 

drugs and monitoring their activity. Dr. Bedilion states that the claimed invention is a useful tool 

when employed as a highly specific probe in a cDNA microarray: 

Persons skilled in the art would [have appreciated on March 6, 1998] that cDNA 
microarrays that contained SEQ ID N0:2 would be a more useful tool than cDNA 
microarrays that did not contain the polynucleotide in connection with conducting gene 
expression monitoring studies on proposed (or actual) drugs for treating neurological, 
reproductive, and cell proliferative disorders for such purposes as evaluating their 
efficacy and toxicity. 

The Patent Examiner does not dispute that the claimed polynucleotide can be used as a 
probe in cDNA microarrays and used in gene expression monitoring applications. Instead, the 
Patent Examiner contends that the claimed polynucleotide cannot be useful without precise 
knowledge of its biological function. But the law never has required knowledge of biological 
function to prove utility. It is the claimed invention's uses, not its functions, that are the subject 
of a proper analysis under the utility requirement. 

As demo n st r ated b y-the^edilien-deelafatie n, the p ers on of ord iaary-skil l i n the art c a n 

achieve beneficial results from the claimed polynucleotide in the absence of any knowledge as to 
the precise function of the protein encoded by it. The uses of the claimed polynucleotide in gene 
expression monitoring applications are in fact independent of its precise function. 
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I. The Applicable Legal Standard 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, the patent 

applicant need only show that the claimed invention is "practically useful," Anderson v. Natta, 

480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the 

public. Brenner v. Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a 

recent Court of Appeals for the Federal Circuit case, this threshold is not high: 

An invention is "useful" under section 101 if it is capable of providing some 
identifiable benefit. See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 689] 
(1966); Brooktree Corp, v. Advanced Micro Devices, Inc. , 977 F.2d 1555, 1571 
[24 USPQ2d 1401] (Fed. Cir, 1992) ("to violate Section 101 the claimed device 
must be totally incapable of achieving a useful result"); Fuller v. Berger, 120 F. 
274, 275 (7th Cir. 1903) (test for utility is whether invention "is incapable of 
serving any beneficial end"). 

Juicy Whip Incv, Orange Bang Inc., 51 USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 
demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 
20 USPQ2d 1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit 
explained: 

An invention need not be the best or only way to accomplish a certain result, and 
it need only be useful to some extent and in certain applications: "[T]he fact that 
an invention has only limited utility and is only operable in certain applications is 
not grounds for finding lack of utility." Envirotech Corp, v. Al George, Inc., 730 
F.2d 753, 762, 221 USPQ 473, 480 (Fed. Cir. 1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is 
described so that a person of ordinary skill in the art would understand how to use the claimed 
invention, it is sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a., 212 U.S.P.Q. 
327, 343 (3d Cir. 1981). The specificity requirement is met unless the asserted utility amounts to 
a "nebulous expression" such as "biological activity" or "biological properties" that does not 
convey meaningful information about the utility of what is being claimed. Cross v. lizuka, 
753 F.2d 1040, 1048 (Fed. Cir. 1985). 

In addition to conferring a specific benefit on the public, the benefit must also be 
"substantial." Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" 
utility. Nelson v. Bowler, 626 F.2d 853, 856, 206 USPQ 881 (CCPA 1980). 
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If persons of ordinary skill in the art would understand that there is a "well-established" 
utility for the claimed invention, the threshold is met automatically and the applicant need not 
make any showing to demonstrate utility. Manual of Patent Examination Procedure at 
§ 706.03(a). Only if there is no "well-established" utility for the claimed invention must the 
applicant demonstrate the practical benefits of the invention. Id. 

Once the patent applicant identifies a specific utility, the claimed invention is presumed 
to possess it. See In re CortrighU 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re 
Brana, 51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office 
bears the burden of demonstrating that a person of ordinary skill in the art would reasonably 
doubt that the asserted utility could be achieved by the claimed invention. Id, To do so, the 
Patent Office must provide evidence or sound scientific reasoning. See In re Longer, 503 F.2d 
1380, 1391-92, 183 USPQ 288 (CCPA 1974). If, and only if, the Patent Office makes such a 
showing, the burden shifts to the applicant to provide rebuttal evidence that would convince the 
person of ordinary skill that there is sufficient proof of utility. Brana, 51 F.3d at 1566. The 
applicant need only prove a "substantial likelihood" of utility; certainty is not required. Brenner, 
383 U.S. at 532. 

II. Use of the claimed polynucleotide for diagnosis of conditions or diseases 

characterized by expression of KILCH^ for toxicology testing, and for drug 
discovery are sufflcient utilities under 35 U.S.C. § 101 

The claimed invention meets all of the necessary requirements for establishing a credible 
utility under the Patent Law: There are "well-established" uses for the claimed invention known 
to persons of ordinary skill in the art, and there are specific practical and beneficial uses for the 
invention disclosed in the patent application's specification. These uses are explain ed, in detail, 
in the Bedilion declaration. Objective evidence, not considered by the Patent Office, further 
corroborates the credibility of the asserted utilities. 

A. The use of KILCH for toxicology testing, drug discovery, and disease 
diagnosis are practical uses that confer ^^specific benefits" to the public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in 
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toxicology testing, drug development and disease diagnosis through gene expression profiUng, 
These uses are explained in detail in the accompanying Bedilion declaration. There is no dispute 
that the claimed invention is in fact a useful tool in cDNA microarrays used to perform gene 
expression analysis. That is sufficient to establish utility for the claimed polynucleotide. 

In his Declaration, Dr. Bedilion explains the many reasons why a person skilled in the art 
reading the Hillman '614 application on March 6, 1998 would have understood that application 
to disclose the claimed polynucleotide to be useful for a number of gene expression monitoring 
applications, e.g., as a highly specific probe for the expression of that specific polynucleotide in 
connection with the development of drugs and the monitoring of the activity of such drugs. 
(BediUon Declaration at, e.g., ^ 10-15). Much, but not all, of Dr. Bedilion's explanation 
concems the use of the claimed polynucleotide in cDNA microarrays of the type first developed 
at Stanford University for evaluating the efficacy and toxicity of drugs, as well as for other 
applications. (Bedilion Declaration, H 12 and 15).^ 

In connection with his explanations, Dr. Bedilion states that the "Hillman '614 
specification would have led a person skilled in the art in March 1998 who was using gene 
expression monitoring in connection with working on developing new drugs for the treatment of 
neurological, reproductive, and cell proliferative disorders [a] to conclude that a cDNA 
microarray that contained the SEQ ID N0:2 polynucleotide would be a highly useful tool and [b] 
to request specifically that any cDNA microarray that was being used for such purposes to 
contain the SEQ ID N0:2 polynucleotide" (Bedilion Declaration, f 15 ). For example, as 
explained by Dr. Bedilion, "[p]ersons skilled in the art would [have appreciated on March 6, 
1998] that cDNA microarrays that contained the claimed polynucleotide would be a more useful 
tool than cDNA microarrays that did not contain the polynucleotide in connection with 
conducting gene expression monitoring studies on proposed (or actual) drugs to treat 

neurological, reproductive, and cell proliferative disorders for such purposes as evaluating their 

efficacy and toxicity." Id, 



^Dr. Bedilion also explained, for example, why persons skilled in the art would also 
appreciate, based on the Hillman '614 specification, that the claimed polynucleotide would be 
useful in connection with developing new drugs using technology, such as Northern analysis, that 
predated by many years the development of the cDNA technology (Bedilion Declaration, % 16). 
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In support of those statements, Dr. Bedilion provided detailed explanations of how cDNA 
technology can be used to conduct gene expression monitoring evaluations, with extensive 
citations to pre-March 6, 1998 publications showing the state of the art on March 6, 1998. 
(Bedilion Declaration, ^ 1 10-14). While Dr. Bedihon's explanations in paragraph 15 of his 
Declaration include almost four pages of text and seven subparts (a)-(g), he specifically states 
that his explanations are not "all-inclusive." Id. For example, with respect to toxicity 
evaluations, Dr. Bedilion had earlier explained how persons skilled in the art who were working 
on drug development on March 6, 1998 (and for several years prior to March 6, 1998) "without 
any doubt" appreciated that the toxicity (or lack of toxicity) of any proposed drug was "one of the 
most important criteria to be evaluated in connection with the development of the drug" and how 
the teachings of the Hillman '614 application clearly include using differential gene expression 
analyses in toxicity studies (Bedilion Declaration, \ 10). 

Thus, the Bedilion Declaration establishes that persons skilled in the art reading the 
Hillman '614 application at the time it was filed "would have wanted their cDNA microarray to 
have a SEQ ID NO: 2 probe because a microarray that contained such a probe (as compared to 
one that did not) would provide more useful results in the kind of gene expression monitoring 
studies using cDNA microarrays that persons skilled in the art have been doing since well prior 
to March 6, 1998" (Bedilion Declaration, \ 15, item (g)). This, by itself, provides more than 
sufficient reason to compel the conclusion that the Hillman '614 application disclosed to persons 
skilled in the art at the time of its filing substantial, specific and credible real-world utilities for 
the claimed polynucleotide. 

Nowhere does the Patent Examiner address the fact that, as described on page 5 1 of the 
Hillman '614 application, the claimed polynucleotides can be used as highly specific probes in, 
for example, cDNA microarrays - probes that without question can be used to measure both the 
^istence and-^nount of complementary RNA sequences known to be the expression products-ofl 
the claimed polynucleotides. The claimed invention is not, in that regard, some random sequence 
whose value as a probe is speculative or would require further research to determine. 

Given the fact that the claimed polynucleotide is known to be expressed, its utility as a 
measuring and analyzing instrument for expression levels is as indisputable as a scale's utility for 
measuring weight. This use as a measuring tool, regardless of how the expression level data 
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ultimately would be used by a person of ordinary skill in the art, by itself demonstrates that the 
claimed invention provides an identifiable, real-world benefit that meets the utility requirement. 
Raytheon v. Roper, 724 F,2d 951, (Fed. Cir. 1983) (claimed invention need only meet one of its 
stated objectives to be useful); In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999) (how the 
invention works is irrelevant to utility); MPEP § 2107 ("Many research tools such as gas 
chromatographs, screening assays, and nucleotide sequencing techniques have a clear, specific, 
and unquestionable utility (e.g., thev are useful in analvzing compounds )" (emphasis added)). 

Though appellants need not so prove to demonstrate utility, there can be no reasonable 
dispute that persons of ordinary skill in the art have numerous uses for information about relative 
gene expression including, for example, understanding the effects of a potential drug for treating 
neurological, reproductive, and cell proliferative disorders. Because the patent application states 
explicitly that the claimed polynucleotide is known to be expressed both in normal cells as well 
as cancerous and immortalized cells, including reproductive and neural tissues (see the Hillman 
'614 application at pagel6), and expresses a protein that is a member of a class of kinesins 
known to be associated with diseases such as neurological, reproductive, and cell proliferative 
disorders, there can be no reasonable dispute that a person of ordinary skill in the art could put 
the claimed invention to such use. In other words, the person of ordinary skill in the art can 
derive more information about a potential neurological, reproductive, and cell proliferative 
disorders drug candidate or potential toxin with the claimed invention than without it (see 
Bedilion Declaration at, e.g., f 15, subparts [(e)-(f)]). 

The Bedilion Declaration shows that a number of pre-March 6, 1998 publications confirm 
and further establish the utility of cDNA microarrays in a wide range of drug development gene 
expression monitoring applications at the time the Hillman '614 application was filed (Bedilion 
Declaration ff 10-14; Bedilion Exhibits A-G). Indeed, Brown and Shalon U.S. Patent No. 
-5^ 07,522 (the Brown '522^atent, Bedi lion^xhibitX^^-whichossu ed fro m a p at ent a ppl i ca tion — 
filed in June 1995 and was effectively published on December 29, 1995 as a result of the 
publication of a PCT counterpart application, shows that the Patent Office recognizes the 
patentable utility of the cDNA technology developed in the early to mid-1990s. As explained by 
Dr. Bedilion, among other things (Bedilion Declaration, f 12): 

The Brown '522 patent further teaches that the "[m]icroarrays of immobilized 
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nucleic acid sequences prepared in accordance with the invention" can be used in 
"numerous" genetic appUcations, including "monitoring of gene expression" 
applications (see [Bedilion] Tab D at col. 14, lines 36-42). The Brown '522 
patent teaches (a) monitoring gene expression (i) in different tissue types, (ii) in 
different disease states, and (iii) in response to different drugs, and (b) that arrays 
disclosed therein may be used in toxicology studies (see [Bedilion] Tab D at col. 
15, lines 13-18 and 52-58 and col. 18, lines 25-30). 

Literature reviews published shortly after the filing of the Hillman '614 application 
describing the state of the art further confirm the claimed invention's utility. Rockett, et al. 
confirm, for example, that the claimed invention is useful for differential expression analysis 
regardless of how expression is regulated: 

Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular 
analysis, recognition of the importance of differential gene expression and 
characterization of differentially expressed genes has existed for many years. 

* * * 

Although differential expression technologies are applicable to a broad range of 
models, perhaps their most important advantage is that, in most cases, absolutely 
no prior knowledge of the specific genes which are up- or down-regulated is 
required. 

* * * 

Whereas it would be informative to know the identity and functionality of all 
genes up/down regulated by . . . toxicants, this would appear a longer term goal 

However, the current use of gene profiling yields a pattern of gene changes 

for a xenobiotic of unknown toxicity which may be matched to that of well 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
extensive toxicological examination, (emphasis added) 

Rockett et al.. Differential gene expression "in drue metabolism and toxicologv: practicalities, 
problems and potential . 29 Xenobiotica No. 7, 655 (1999). 

In another pre-March 6, 1998 article, Lashkari, et al. state explicitly that sequences that 
are merely "predicted" to be expressed (predicted Open Reading Frames, or ORFs) - the claimed 
invention in fact is known to be expressed - have numerous uses: 

Efforts have been directed toward the amplification of each predicted ORF or any 
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other region of the genome ranging from a few base pairs to several kilobase 
pairs. There are many uses for these amplicons- they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into other specialized 
vectors such as those used for two-hybrid analysis. The amplicons can also be 
used directly bv, for example, arraying onto glass for expression analvsis . for 
. DNA binding assays, or for any direct DNA assay. 

Lashkari, et al., Whole genome analysis: Experimental access to all genome sequenced segments 
through larger-scale efficient oligonucleotide synthesis and PGR , 94 Proc. Nat. Acad. Sci. 8945 
(Aug. 1997) (emphasis added). 

B. The use of nucleic acids coding for proteins expressed by humans as tools for 
toxicology testing, drug discovery, and the diagnosis of disease is now 'Svell- 
established'' 

The technologies made possible by expression profiling and the DNA tools upon which 
they rely are now well-established. The technical literature recognizes not only the prevalence of 
these technologies, but also their unprecedented advantages in drug development, testing and 
safety assessment. These technologies include toxicology testing, as described by Bedilion in his 
declaration. 

Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., 
John C. Rockett, et al., supra: 

Knowledge of toxin-dependent regulation in target tissues is not solely an academic 
pursuit as much interest has been generated in the pharmaceutical industry to harness this 
technology in the early identification of toxic drug candidates, thereby shortening the 
developmental process and contributing substantially to the safety assessment of new 
drugs. 

To the same effect are several other scientific publications, including Emile F. Nuwaysir, et al., 
Microarrays and Toxicology: The Advent of Toxicogenomics , 24 Molecular Carcinogenesis 153 

(1999); Sandra Steiner and N. Leigh Anderson, Expression_profiling in toxicolo gy - potentials 

and limitations . 112-13 Toxicology Letters 467 (2000). 

Nucleic acids useful for measuring the expression of whole classes of genes are routinely 
incorporated for use in toxicology testing. Nuwaysir et al. describes, for example, a Human 
ToxChip comprising 2089 human clones, which were selected 

for their well-documented involvement in basic cellular processes as well as their 
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responses to different types of toxic insult. Included on this list are DNA replication and 
repair genes, apoptosis genes, and genes responsive to PAHs and dioxin-like compounds, 
peroxisome proliferators, estrogenic compounds, and oxidant stress. Some of the other 
categories of genes include transcription factors, oncogenes, tumor suppressor genes, 
cyclins, kinases, phosphatases, cell adhesion and motility genes, and homeobox genes. 
Also included in this group are 84 housekeeping genes, whose hybridization intensity is 
averaged and used for signal normalization of the other genes on the chip. 

See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of special 
interest in making a human toxicology microarray). 

The more genes that are available for use in toxicology testing, the more powerful the 
technique. "Arrays are at their most powerful when they contain the entire genome of the species 
they are being used to study." John C. Rockett and David J. Dix, A pplication of DNA Arravs to 
Toxicologv . 107 Environ. Health Perspec.681, No. 8 (1999). Control genes are carefully selected 
for their stability across a large set of array experiments in order to best study the effect of 
toxicological compounds. See attached email from the primary investigator on the Nuwaysir 
paper. Dr. Cynthia Afshari, to an Incyte employee, dated July 3, 2000, as well as the original 
message to which she was responding, indicating that even the expression of carefully selected 
control genes can be altered. Thus, there is no expressed gene which is irrelevant to screening for 
toxicological effects, and all expressed genes have a utility for toxicological screening. 

In fact, the potential benefit to the public, in terms of lives saved and reduced health care 

costs, are enormous. Recent developments provide evidence that the benefits of this information 

are already beginning to manifest themselves. Examples include the following: 

• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter 
gene, and chromosomal mapping location, to identify the key gene associated with 
Tangiers disease. This discovery took place over a matter of only a few weeks, 
due to the power of these new genomics technologies. The discovery received an 
award from the American Heart Association as one of the top 10 discoveries 
as sociated with heart di sease^^ese arch in 19 99^ 

In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte' s genomic 
information database. Other Incyte customers have privately reported similar 
experiences. The implications of this significant saving of time and expense for 
the number of drugs that may be developed and their cost are obvious. 
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• In a February 10, 2000, article in the Wall Street Journal, one Incyte customer 

stated that over 50 percent of the drug targets in its current pipeUne were derived 
from the Incyte database. Other Incyte customers have privately reported similar 
experiences. By doubling the number of targets available to pharmaceutical 
researchers, Incyte genomic information has demonstrably accelerated the 
development of new drugs. 

Because the Patent Examiner failed to address or consider the "well-established" utilities 
for the claimed invention in toxicology testing, drug development, and the diagnosis of disease, 
the Examiner's rejection should be withdrawn regardless of its merit. 

C. The similarity of the polypeptide encoded by the claimed invention to 
another polypeptide of undisputed utiUty demonstrates utility 

In addition to having substantial, specific and credible utilities in numerous gene 
expression monitoring applications, the utility of the claimed polynucleotide can be imputed 
based on the relationship between the polypeptide it encodes, KILCH, and another polypeptide of 
unquestioned utility, kinesin light chain. The two polypeptides have sufficient similarities in 
their sequences that a person of ordinary skill in the art would recognize more than a reasonable 
probability that the polypeptide encoded for by the claimed invention has utility similar to 
kinesin light chain. Appellant need not show any more to demonstrate utility. In re Brana, 5 1 
F.3dat 1567. 

It is undisputed, and readily apparent from the patent application, that the polypeptide 
encoded for by the claimed polynucleotide shares more than 66% sequence identity over 619 
amino acid residues with kinesin light chain. [See Figure 2 ] "In addition, the region of KILCH 
from N77 to L,53 shares 83% identity with the region of human KLC that contains 11 of the 15 
heptad repeats" and "the region of KILCH from Q234 to K403 shares 87% identity with the region 
of human KLC that contains four imperfect tandem repeats." [Specification at page 16, lines 18- 
20] This is more than enough homology to demonstrate a reasonable probability that the utility 
of kinesin light chain can be imputed to the claimed invention (through the polypeptide it 
encodes). It is well-known that the probability that two unrelated polypeptides share more than 
40% sequence homology over 70 amino acid residues is exceedingly small. Brenner et al., Proc. 
Natl. Acad. Sci. 95:6073-78 (1998). Given homology in excess of 40% over many more than 70 
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amino acid residues, the probability that the polypeptide encoded for by the claimed 
polynucleotide is related to kinesin light chain is, accordingly, very high. 

The Examiner must accept the appellants' demonstration that the homology between the 
polypeptide encoded for by the claimed invention and kinesin light chain demonstrates utility by 
a reasonable probability unless the Examiner can demonstrate through evidence or sound 
scientific reasoning that a person of ordinary skill in the art would doubt utility. See In re 
hanger, 503 F.2d 1380, 1391-92, 183 USPQ 288 (CCPA 1974). The Examiner has not provided 
sufficient evidence or sound scientific reasoning to the contrary. 

D. Objective evidence corroborates tlie utilities of the claimed invention 

There is, in fact, no restriction on the kinds of evidence a Patent Examiner may consider 
in determining whether a "real-world" utility exists. Indeed, "real-world" evidence, such as 
evidence showing actual use or conraiercial success of the invention, can demonstrate conclusive 
proof of utility. Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 
F.2d 854, 856, 12 USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or 
sold by any person or entity other than the patentee is conclusive proof of utility. United States 
Steel Corp, v. Phillips Petroleum Co,, 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases containing all 
expressed genes (along with the polypeptide translations of those genes), in particular genes 
having medical and pharmaceutical significance such as the instant sequence. (Note that the 
value in these databases is enhanced by their completeness, but each sequence in them is 
independently valuable.) The databases sold by Appellants' assignee, Incyte, include exactly the 
kinds of information made possible by the claimed invention, such as tissue and disease 
associations. Incyte sells its database containing the claimed sequence and millions of other 
-se qu e nces throughout the scie ntific ^ommunity, including 4a^h ^m a ceu t ic a l comp ani es who u se. 
the information to develop new pharmaceuticals. 

Both Incyte' s customers and the scientific community have acknowledged that Incyte' s 
databases have proven to be valuable in, for example, the identification and development of drug 
candidates. As Incyte adds information to its databases, including the information that can be 
generated only as a result of Incyte' s discovery of the claimed polynucleotide and its use of that 
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polynucleotide on cDNA microarrays, the databases become even more powerful tools. Thus the 
claimed invention adds more than incremental benefit to the drug discovery and development 
process. 

III. The Patent Examiner's Rejections Are Without Merit 

Rather than responding to the evidence demonstrating utility, the Examiner attempts to 
dismiss it altogether by arguing that the disclosed and well-established utilities for the claimed 
polynucleotide are not "specific . . . substantial [and] credible" utilities. (3/11/03 Office Action, 
at page 3). The Examiner is incorrect both as a matter of law and as a matter of fact. 

A. The Precise Biological Role Or Function Of An Expressed Polynucleotide Is 
Not Required To Demonstrate Utility 

The Patent Examiner's primary rejection of the claimed invention is based on the ground 
that, without information as to the precise "biological role" of the claimed invention, the claimed 
invention's utility is not sufficiently specific. According to the Examiner, it is not enough that a 
person of ordinary skill in the art could use and, in fact, would want to use the claimed invention 
either by itself or in a cDNA microarray to monitor the expression of genes for such applications 
as the evaluation of a drug's efficacy and toxicity. The Examiner would require, in addition, that 
the applicant provide a specific and substantial interpretation of the results generated in any given 
expression analysis. 

It may be that specific and substantial interpretations and detailed information on 
biological function are necessary to satisfy the requirements for publication in some technical 
joumals, but they are not necessary to satisfy the requirements for obtaining a United States 
patent. The relevant question is not, as the Examiner would have it, whether it is known how or 

why the invention-woi^^^g CortM^right, 165 F.3d 1353, 1359 (Fed Cat, 1999)J>ut rathex 

whether the invention provides an "identifiable benefit" in presently available form. Juicy Whip 
Inc. V. Orange Bang Inc., 185 F.3d 1364, 1366 (Fed. Cir. 1999). If the benefit exists, and there is 
a substantial likelihood the invention provides the benefit, it is useful. There can be no doubt, 
particularly in view of the Bedilion Declaration (at, e.g., ff 10 and 15), that the present invention 
meets this test. 
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The threshold for determining whether an invention produces an identifiable benefit is 
low. Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of 
ordinary skill in the art would not know how to achieve an identifiable benefit and, at least 
according to the PTO guidelines, so-called "throwaway" utilities that are not directed to a person 
of ordinary skill in the art at all, do not meet the statutory requirement of utility. Utility 
Examination Guidelines, 66 Fed. Reg. 1092 (Jan. 5, 2001). 

Knowledge of the biological function or role of a biological molecule has never been 

required to show real-world benefit. In its most recent explanation of its own utility guidelines, 

the PTO acknowledged so much (66 F.R. at 1095): 

[T]he utility of a claimed DNA does not necessarily depend on the function of the 
encoded gene product. A claimed DNA may have specific and substantial utility 
because, e.g., it hybridizes near a disease-associated gene or it has gene-regulating 
activity. 

By implicitly requiring knowledge of biological function for any claimed nucleic acid, the 
Examiner has, contrary to law, elevated what is at most an evidentiary factor into an absolute 
requirement of utility. Rather than looking to the biological role or function of the claimed 
invention, the Examiner should have looked first to the benefits it is alleged to provide. 

B. Membership in a Class of Useful Products Can Be Proof of Utility 

Despite the uncontradicted evidence that the claimed polynucleotide encodes a 
polypeptide in the kinesin family, the Examiner refused to impute the utility of the members of 
the kinesin family to KILCH. In the 07/03/01 Office Action, the Patent Examiner takes the 
position that, unless Appellants can identify which particular biological function within the class 
of kinesin light chain homologs is possessed by KILCH, utility cannot be imputed (See the 
07/03/01 Office Action, at page 3). To demonstrate utility by membership in the class of kinesin 
light Cham homologs, the Examiner would require that all kmesins possess a "common" utility. 

There is no such requirement in the law. In order to demonstrate utility by membership in 
a class, the law requires only that the class not contain a substantial number of useless members. 
So long as the class does not contain a substantial number of useless members, there is sufficient 
likelihood that the claimed invention will have utility, and a rejection under 35 U.S.C. § 101 is 
improper. That is true regardless of how the claimed invention ultimately is used and whether or 
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not the members of the class possess one utility or many. See Brenner v, Manson, 383 U.S. 519, 
532 (1966); Application of Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utility only if the class 
contains a sufficient number of useless members such that a person of ordinary skill in the art 
could not impute utility by a substantial likelihood. There would be, in that case, a substantial 
likelihood that the claimed invention is one of the useless members of the class. In the few cases 
in which class membership did not prove utility by substantial likelihood, the classes did in fact 
include predominately useless members. £.g., Brenner (man-made steroids); Kirk (same); Natta 
(man-made polyethylene polymers). 

The Examiner addresses KILCH as if the general class in which it is included is not the 
kinesin family, but rather all polynucleotides or all polypeptides, including the vast majority of 
useless theoretical molecules not occurring in nature, and thus not pre-selected by nature to be 
useful. While these "general classes" may contain a substantial number of useless members, the 
kinesin family does not. The kinesin family is sufficiently specific to rule out any reasonable 
possibility that KILCH would not also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the kinesin light chain class of 
kinesins has any, let alone a substantial number, of useless members, the Examiner must 
conclude that there is a "substantial likelihood" that the KILCH encoded by the claimed 
polynucleotide is useful. It follows that the claimed polynucleotide also is useful. 

Even if the Examiner's "common utility" criterion were correct - and it is not - the 
kinesin family would meet it. It is undisputed that known members of the kinesin family are 
involved in translocating components within cells. A person of ordinary skill in the art need not 
know any more about how the claimed invention translocates components within cells to use it, 
and the Examiner presents no evidence to the contrary. Instead, the Examiner makes the 
- conclusory observation that a person of ordin ary^ skill in th e- art would need to kno w- whethe r,4Q^ 
example, any given kinesin translocates components within cells. The Examiner then goes on to 
assume that the only use for KILCH absent knowledge as to how the kinesin actually works is 
further study of KILCH itself. 

Not so. As demonstrated by Appellants, knowledge that KILCH is a kinesin is more than 
sufficient to make it useful for the diagnosis and treatment of neurological, reproductive, and cell 
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proliferative disorders. Indeed, KILCH has been shown to be expressed in neurological, 
reproductive and proliferating tissues. The Examiner must accept these facts to be true unless the 
Examiner can provide evidence or sound scientific reasoning to the contrary. But the Examiner 
has not done so. 

C. Because the uses of KILCH in toxicology testing, drug discovery, and disease 
diagnosis are practical uses beyond mere study of the invention itself, the 
claimed invention has substantial utility* 

The Examiner's rejection of the claims at issue is tantamount to an assertion that the use 
of an invention as a tool for research is not a substantive use. 

There is no authority for the proposition that use as a tool for research is not a substantial 
utility, hideed, the Patent Office has recognized that just because an invention is used in a 
research setting does not mean that it lacks utility (MPEP § 2107): 

Many research tools such as gas chromatographs, screening assays, and nucleotide 
sequencing techniques have a clear, specific and unquestionable utiUty (e.g., they are 
useful in analyzing compounds). An assessment that focuses on whether an invention is 
useful only in a research setting thus does not address v^hether the specific invention is in 
fact "useful" in a patent sense. Listead, Office personnel must distinguish between 
inventions that have a specifically identified utility and inventions whose specific utility 
requires further research to identify or reasonably confirm. 

The Patent Office's actual practice has been, at least until the present, consistent with that 
approach. It has routinely issued patents for inventions whose only use is to facilitate research, 
such as DNA ligases. These are acknowledged by the PTO's Training Materials themselves to 
be useful, as well as DNA sequences used, for example, as markers. 

Only a limited subset of research uses are not "substantial" utilities: those in which the 
only known use for the claimed invention is to be an object of further study, thus merely inviting 
further research. This follows from Brenner, in which the U.S. Supreme Court held that a 
process for making a compound does not confer a substantial benefit where the only known use 
of the compound was to be the object of further research to determine its use. Id, at 535. 
Similarly, in Kirk, the Court held that a compound would not confer substantial benefit on the 
public merely because it might be used to synthesize some other, unknown compound that would 
confer substantial benefit. Kirk, 376 F.2d at 940, 945 ("What appellants are really saying to 
those in the art is take these steroids, experiment, and find what use they do have as medicines."). 
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Nowhere do those cases state or imply, however, that a material cannot be patentable if it has 
some other beneficial use in research. 

As used in toxicology testing, drug discovery, and disease diagnosis, the claimed 
invention has a beneficial use in research other than studying the claimed invention or its protein 
products. It is a tool, rather than an object, of research. The data generated in gene expression 
monitoring using the claimed invention as a tool is not used merely to study the claimed 
polynucleotide itself, but rather to study properties of tissues, cells, and potential drug candidates 
and toxins. Without the claimed invention, the information regarding the properties of tissues, 
cells, drug candidates and toxins is less complete. [BediHon Declaration atf 15.] 

The claimed invention has numerous additional uses as a research tool, each of which 
alone is a "substantial utility." These include uses such as chromosomal markers and probes. 

D. The Patent Examiner failed to demonstrate that a person of ordinary skill in the art 
would reasonably doubt the utility of the claimed invention 

The Examiner has cited several references in order to assert, inter alia, that "the relevant 
literature reports examples of polypeptide families wherein individual members have distinct, 
and sometimes even opposite, biological activities" (3/11/03 Office Action, at page 6). These 
references fail to provide support for the Examiner's position, as is explained in detail below. 

The literature cited by the Examiner is not inconsistent with the Appellants' proof of 
homology by a reasonable probability. It may show that Appellants cannot prove function by 
homology with certainty, but Appellants' need not meet such a rigorous standard of proof. 
Under the applicable law, once the applicant demonstrates a prima facie case of homology, the 
Examiner must accept the assertion of utility to be true unless the Examiner comes forward with 
evidence showing a person of ordinary skill would doubt the asserted utility could be achieved by 
treas onab l e prob a bility. See In re Brana, 51 F.3d at 1566; In re Lang ^?v^Q3-J^2dJ 380, 1 391= — 
92, 183 USPQ 288 (CCPA 1974). The Examiner has not made such a showing and, as such, the 
Examiner's rejection should be reversed. 

The Examiner cites Tischer et al. and Benjamin et al. as disclosing that VEGF and PDGF 
have opposite mitogenic activities. Appellants respectfully point out that the sequence homology 
between VEGF and the PDGF A and B chains is quite low, with little more than the conserved 
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cysteine residues being conserved between the sequences (see Figures 4 A, 4B, and 7 of Tischer 
et al.). The homology is far less than the 66% identity for KILCH and its closest homologs. That 
VEGF and PDGF do not share the same function is therefore hardly surprising, and does not in 
any way imply that KILCH, with far greater homology to kinesin light chain (KLC), would not 
share the function of this protein. 

The Examiner cites Massague and Vukicevic et al. as disclosing that related members of 
the TGF family of proteins have different functions. Appellants note that the different members 
of the TGF family have from 22-70% sequence identity (and that the most closely related 
members are subunits of the same heterodimeric protein, not separate proteins with different 
functions) (Massague, page 437, col. 1; and Figure 1). In most cases, therefore, the homology 
between TGF superfamily members is less than that observed between KILCH and KLC. These 
references disclosing differing functions in family members less closelv related in sequence than 
KILCH and KLC and spanning an entire superfamily of proteins, would not serve to make one of 
ordinary skill in the art reasonably doubt that KILCH would have similar functions to the more 
closely related KLC and human KLC. 

The Examiner cites Pilbeam as allegedly disclosing two structurally closely related 
proteins, PTH and PTHrP, which can have opposite effects on bone resorption. The Examiner 
fails to notice that "[t]here is strong homology of PTHrP with PTH only in the amino-terminal 
domain" (Pilbeam, page 717, col. 2) and that N-terminal fragments of both proteins in fact have 
similar biological activities (Pilbeam, page 717, col. 2). It is only when the non-homologous C- 
terminal regions are added that the different activities emerge. Thus this reference su pports 
Appellants' arguments that homologous protein sequences have similar functions. 

Citing, Kopchick et al., the Exanniner appears to making the proposition that the 
biological activity of KILCH is not credible because a single amino acid change in a protein can 
result in dist inct biological activities. (S ee the 3/1 1/03 O ffice Action, at page 7). However, this 
proposition lacks proper evidentiary relevance. As stated in Boehringer Ingelheim Vetmedica 
Inc. V. Schering-plough Corporation, the "fact that even a single nucleotide or amino acid 
substitution may drastically alter the function of a gene or protein is not evidence of anything at 
all." Boehringer Ingelheim Vetmedica Inc. v. Schering-Plough Corp., 320 F.3d 1339,1351 (Fed. 
Cir. 2003). Here, the Examiner presents no evidence whatsoever of the effect that even a single 
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amino acid change would have on the function of the claimed polypeptides. Further, the 
observations in Kopchick et al. resulted from mutations which were deliberately engineered so as 
to alter an essential functional residue (see Kopchick, page 9, lines 21-34) and thus are irrelevant 
to the question of whether a naturally-occurring sequence retains the function of its homolog, as 
is the case here. 

The Examiner also cites various references to support the assertion that "function cannot 
be predicted based solely on similarity to a protein found in the sequence databases" (3/1 1/03 
Office Action, at page 5). For example, the Examiner cites Skolnick et al. as allegedly 
demonstrating that knowing the protein structure by itself is insufficient to annotate a number of 
functional classes. However, Skolnick et al. disclose that there are only 30-50% of proteins 
whose function cannot be assigned by any current methods (page 37, col. 2). This makes it more 
likelv than not that the claimed polypeptides, which are homologous to members of known and 
well characterized functional families: the kinesins and kinesin light chain homologs, are among 
the group which can be properly annotated. 

Furthermore, Skolnick et al. disclose that "enzyme active sites are indeed more highly 
conserved than other parts of the protein" (page 35, col. 1). KILCH contains three kinesin light 
chain repeat signatures from D27g to Q319, R320 to Q36J, and R352 to K403. KILCH shares 87% 
identity with KLC within the region Q234 to K403. (See the Specification at page 16, lines, 13-14) 
The high degree of conservation of these known sites would lead one of skill in the art to 
consider it highly probable that ICQLCH possesses KLC activity. 

The Examiner cites Bork and Doerks et al. as stating that the error rate of functional 
annotation in the sequence database is considerable, making it difficult to infer correct function 
by comparison to sequences in the databases as errors are copied and propagated. To reinforce 
this point the Examiner also cites Bork et al., to the effect that "questionable interpretations are 
written into the sequence d ata base and are the n-considered4a cts " (3/11/03 Office A ction^^g^ — 
6). Appellants note that these references (as well as the others cited) pertain to automated 
sequence annotation by "software robots". (See, for example, Doerks et al. page 248, col, 1; 
Bork page 398, col. 1; and Bork et al. page 426, col 1; as well as Smith et al., page 1222, col. 1; 
and Brenner, page 132, col. 1). 

This issue is not relevant to the case here, since the claimed polypeptides were not 
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assigned an annotation by a computer, but subjected to analysis by trained scientists who noted 
not only the homology to known KLC but also the conservation of KLC signature sequences. 
Unlike the examples in Bork et al., where software robots assigned functions based upon 
structural similarity of only a small domain of the new protein to a small domain of a known 
protein, KILCH is homologous to KLC over the full length of the sequences. 

Doerks et al. also discusses pitfalls in protein annotation such as misleading sequence 
similarities to regions that are not the active site, or failure to conserve active site residues. 
These considerations do not apply in the instant case, as KILCH is identified as a kinesin light 
chain polypetide not only by homology to a known kinesin light chain polypetide (rather than to 
incorrectly annotated unknown proteins, as was the case for one of the examples in Doerks et al., 
page 248, col. 3) but also by characteristic conserved domains found in the kinesin family of 
polypeptides. Nor has the Examiner provided any evidence that KILCH shares significant 
homology with proteins that are not kinesin light chain polypetides (parallel to the example in 
Doerks et al. in which a single sequence had homology to multiple hits from different protein 
families, shown on page 249). 

The Examiner cites Smith et al. as arguing that there are numerous cases in which 
proteins of different functions share structural similarity due to evolution from a conmion 
ancestor. Appellants note that the only example provided is that of the transducin homologs, 
which share common WD repeat regions (Smith et al. page 1222, col. 3), but not necessarily any 
large amount of overall homology, as is the case for KJLCH and KLC. 

The Examiner also cites Brenner et al. to argue that since there are only about 1000 major 
gene superfamilies in nature, most homologs must have different molecular and cellular 
functions. Again, this is a generality that does not apply in the instant case, since the claimed 
polypeptides have been identified with much greater specificity than merely as a member of a 

-' ^s up ei fa m ily ". The KILCII p rotei n has been id e ntified as a kinesin light cha i n polypeptid e. 

This is a fine enough distinction to identify KILCH as having the specific function of its limited 
subfamily. 

Finally, the Examiner cites Bowie et al., as stating that the determination of three- 
dimensional structure from primary amino acid sequence, and the subsequent inference of 
detailed aspects of function from structure is extremely complex and unlikely to be solved in the 
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near future. Appellants first note that it is not necessary to determine a protein's three- 
dimensional structure in order to ascertain its function; there are many proteins of well-known 
functions whose structures have not yet been determined. Appellants also respectfully direct the 
Board's attention to Bowie et al. at page 1306, column 2, wherein the authors state that "proteins 
are surprisingly tolerant of amino acid substitutions," and that "at some positions, many different 
nonconservative substitutions were allowed." It is well-known in the art that natural selection 
tends to conserve those residues critical for protein structure and function during the course of 
evolution. This is why the study of a set of related sequences can indicate which residues are 
critical, since these are the ones which are conserved between sequences of different species (see 
Bowie et al., page 1306, and pages 1308-1309). Thus Bowie supports Appellants' assertions that 
significant sequence homology to a known kinesin light chain polypeptide coupled with the 
conservation of known active sites would lead one of skill in the art to conclude that KILCH is in 
fact a kinesin light chain polypeptide. 

In conclusion, none of the cited references serve to meet the burden of demonstrating that 
the skilled worker would find it more likely than not that the asserted utility for the claimed 
proteins as kinesin light chain homologs having kinesin light chain activity and an association 
with cancer was not correct. 

In addition, it is noted that, according to recent conversations with supervisory personnel 
in Technology Center 1600 of the USPTD, this aspect of the argument regarding the credibility 
of homology-based assertion of function has been discredited. 

In fact, at a recent Biotechnology Customer Partnership Meeting held at the USPTO on 
Apriin, 2001, in a talk by Senior Examiner James Martinell, it was emphasized that Applicant's 
assertion that his claimed protein "is a member of a family of proteins that already known based 
upon amino acid sequence homology" can be effective as an assertion of utility for the claimed 

s e quence. According to Dr. Ma r ti i ^ll. the pro pe^ questio n ^or the Examiner to ask, after 

searching the prior art for the claimed protein , is "Would one of skill in the art accept that the 
protein has been placed in the correct family of proteins as is asserted?" The "two" [sic: three] 
possible answers that can be deduced from this prior art search are, according to Dr. Martinell: 

• The search does not reveal any evidence that the family attribution made in the 
application is either incorrect or may be incorrect 

• The protein either more likely belongs to a family other than that asserted in 
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the application or likely does not belong to the family asserted in the 

application 

• The search shows that the attribution is likely correct 

(From handouts of Dr. MartinelFs slides distributed April 17, 2001; emphasis added) 
It is clear from the above that the tactic taken by the Examiner in asserting the very slight possi- 
bility that ANY minor sequence change might have a dramatic effect on the function of the 
protein has been abandoned by the USPTO as a credible basis for a rejection under either the 
utility requirement of 35 U.S.C. § 101 or the enablement requirement of § 1 12, first paragraph. 

However, in any case, it is noted that the Examiner has failed to meet the above require- 
ments now recognized by the USPTO. He has cited no evidence particular to the claimed 
protein , e.g., inconsistent findings deduced from his search, upon which to base any objection to 
the assignment of functional homology to the family of kinesin light chain polypeptides. Indeed, 
there is no such evidence. 

Moreover, it must be remembered, as set forth in the USPTO's own M.P.E.P., that in 
order to raise such doubt in the veracity of Appellants' assertion, the Examiner must establish 
either (A) the logic underlying the assertion is seriously flawed, or (B) the facts upon which the 
assertion is based are inconsistent with the logic underlying the assertion. The Examiner has 
accomplished neither of these minimum standards. 

Accordingly, reversal of the utility rejection based on 35 U.S.C. §101 is believed to be in 

order. 

IV. By Requiring the Patent Applicant to Assert a Particular or Unique Utility, the 
Patent Examination Utility Guidelines and Training Materials Applied by the 
Patent Examiner Misstate the Law 

There is an additional, independent reason to withdraw the rejections: to the extent the 

rejections are-based on Revised Interim Utility Examination Guidelines (64 ¥EiJ1427, 

December 21, 1999), the final Utility Examination Guidelines (66 FR 1092, January 5, 2001) 
and/or the Revised Interim Utility Guidelines Training Materials (USPTO Website 
www.uspto.gov, March 1, 2000), the Guidelines and Training Materials are themselves 
inconsistent with the law. 

The Training Materials, which direct the Examiners regarding how to apply the Utility 
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Guidelines, address the issue of specificity with reference to two kinds of asserted utilities: 

"specific" utilities which meet the statutory requirements, and "general" utilities which do not. 

The Training Materials define a "specific utility" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to general 
utility that would be applicable to the broad class of invention. For example, a claim to a 
polynucleotide whose use is disclosed simply as "gene probe" or "chromosome marker" 
would not be considered to be specific in the absence of a disclosure of a specific DNA 
target. Similarly, a general statement of diagnostic utility, such as diagnosing an 
unspecified disease, would ordinarily be insufficient absent a disclosure of what condition 
can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 
whether the asserted utility is sufficiently "particular," /.e., unique (Training Materials, at page 
52) as compared to the "broad class of invention." (In this regard, the Training Materials appear 
to parallel the view set forth in Stephen G. Kunin, Written Description Guidelines and Utilitv 
Guidelines , 82 J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utihty the 
question to ask is whether or not a utility set forth in the specification is particular to the claimed 
invention.")). 

Such "unique" or "particular" utilities never have been required by the law. To meet the 
utility requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, 
and confer a "specific benefit" on the public. Brenner, 383 U.S. at 534. Thus, incredible "throw- 
away" utilities, such as trying to "patent a transgenic mouse by saying it makes great snake food," 
do not meet this standard. Karen Hall, Genomic Warfare , The American Lawyer 68 (June 2000) 
(quoting John Doll, Chief of the Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the 
Training Materials where "specific utility" is defined (page 5). Practical real-world uses are not 
limited to uses that are unique to an invention. The law requires that the practical utility be 
"definite, " not p art icular. Mon t ed i son, 664 R2d at 375 . Appellants ar e not aware of a ny court — 
that has rejected an assertion of utility on the grounds that it is not "particular" or "unique" to the 
specific invention. Where courts have found utility to be too "general," it has been in those cases 
in which the asserted utility in the patent disclosure was not a practical use that conferred a 
specific benefit. That is, a person of ordinary skill in the art would have been left to guess as to 
how to benefit at all from the invention. In Kirk^ for example, the CCPA held the assertion that a 
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man-made steroid had "useful biological activity" was insufficient where there was no informa- 
tion in the specification as to how that biological activity could be practically used. Kirk, 376 
F.2dat94L 

The fact that an invention can have a particular use does not provide a basis for requiring 
a particular use. See Brana, supra (disclosure describing a claimed antitumor compound as 
being homologous to an antitumor compound having activity against a "particular" type of cancer 
was determined to satisfy the specificity requirement). "Particularity" is not and never has been 
the sine qua non of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long 
as a person of ordinary skill in the art would understand how to achieve a practical benefit from 
knowledge of the class. Only classes that encompass a significant portion of nonuseful members 
would fail to meet the utility requirement. Supra § n.B.2 (Montedison, 664 F.2d at 374-75). 

The Training Materials fail to distinguish between broad classes that convey information 
of practical utility and those that do not, lumping all of them into the latter, unpatentable category 
of "general" utilities. As a result, the Training Materials paint with too broad a brush. Rigorous- 
ly applied, they would render unpatentable whole categories of inventions that heretofore have 
been considered to be patentable and that have indisputably benefitted the public, including the 
claimed invention. See supra § n.B. Thus the Training Materials cannot be applied consistently 
with the law. 
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(9) CONCLUSION 



Appellants respectfully submit that rejections for lack of utility based, inter alia, on an 
allegation of lack of specificity as set forth in the 3/1 1/03 Office Action and as justified in the 
Revised Interim and final Utility Guidelines and Training Materials, are not supported in the law. 
Neither are they scientifically correct, nor supported by any evidence or sound scientific 
reasoning. These rejections are alleged to be founded on facts in court cases such as Brenner and 
Kirk, yet those facts are clearly distinguishable from the facts of the instant application, and 
indeed most if not all nucleotide and protein sequence applications. Nevertheless, the PTO is 
attempting to mold the facts and holdings of these prior cases, "like a nose of wax,"^ to target 
rejections of claims to polypeptide and polynucleotide sequences, as well as to claims to methods 
of detecting said polynucleotide sequences, where biological activity information has not been 
proven by laboratory experimentation, and they have done so by ignoring perfectly acceptable 
utilities fully disclosed in the specifications as well as well-established utilities known to those of 
skill in the art. As is disclosed in the specification, and even more clearly, as one of ordinary skill 
in the art would understand, the claimed invention has well-established, specific, substantial and 
credible utilities. The rejections are, therefore, improper and should be reversed. 

Moreover, to the extent the above rejections were based on the Revised Interim and final 
Examination Guidelines and Training Materials, those portions of the Guidelines and Training 
Materials that form the basis for the rejections should be determined to be inconsistent with the 
law. 



^"The concept of patentable subject matter under §101 is not 'like a nose of wax which 
may be turned and twisted in any direction * * White v. Dunbar, 119 U.S. 47, 51." (Parker v, 
Flooh 198 USPQ 193 (US SupCt 1978)) 
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Due to the urgency of this matter and its economic and public health implications, an 
expedited review of this appeal is earnestly solicited. 

If the USPTO determines that any additional fees are due, the Conmiissioner is hereby 
Respectfully submitted, 

DSfCYTE CORPORATION 
Date: /^SQt'h>Joer 7^^^ ^ T/^J<^^ 



Richard C. Ekstrom 
Reg. No. 37,027 

Direct Dial Telephone: (650) 843-7352 



Date: 5^ftl^lv))>r \S ^Oo\ 





1 Harris 
'eg. No. 44,743 

irect Dial Telephone: (650) 845-4866 



3160 Porter Drive 
Palo Alto, CaUfomia 94304 
Phone: (650) 855-0555 
Fax: (650) 849-8886 



Enclosures: 

1. Brenner et al., Proc. Natl. Acad. Sci. U.S.A. 95:6073-78 (1998). 

2. John C. Rockett, et. al.. Differential gene expression in drug metabolism and toxicology: 
practicalities, problems, and potential . Xenobiotica 29:655-691 (July 1999). 

3. Emile F. Nuwaysir, et al., Microarrays and Toxicology: The Advent of Toxicogenomics . 
Molecular Carcinogenesis 24: 153-159 (1999). 

4. Sandra Steiner and N. Leigh Anderson, Expression profiling in toxicology — potentials 
and limitations . Toxicology Letters 112-13:467-471 (2000). 

5. Email from the primary investigator. Dr. Cynthia Afshari to an Incyte employee, dated 
July 3, 2000, as well as the original message to which she was responding. 



113728 



28 



09/036,614 



Docket No.: PF-0484-1 CPA 
APPENDIX > CLAIMS ON APPEAL 

22. An isolated polynucleotide encoding a polypeptide selected from the group 
consisting of: 

a) a polypeptide comprising an amino acid sequence of SEQ ID N0:1, 

b) a naturally occurring polypeptide comprising an amino acid sequence at least 90% 
identical to an amino acid sequence of SEQ ID N0:1, 

c) a biologically active fragment of a polypeptide having an amino acid sequence of SEQ 
IDNO:l,and 

d) an inmiunogenic fragment of a polypeptide having an amino acid sequence of SEQ ID 

N0:1. 

23. An isolated polynucleotide encoding a polypeptide of SEQ ID NO:L 

24. An isolated polynucleotide of claim 23 comprising the sequence of SEQ ID N0:2. 

25. A recombinant polynucleotide comprising a promoter sequence operably linked to a 
polynucleotide of claim 22. 

26. A cell transformed with a recombinant polynucleotide of claim 25. 

27. A method for producing a polypeptide encoded by a polynucleotide of claim 22, the 
method comprising: 

a) culturing a cell under conditions suitable for expression of the polypeptide, wherein 
said cell is transformed with a recombinant polynucleotide, and said recombinant polynucleotide 
compnses a promoter sequence operably linked to a polynucleotide of claim 22, and 

b) recovering the polypeptide so expressed. 
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28. An isolated polynucleotide selected from the group consisting of: 

a) a polynucleotide comprising a polynucleotide sequence of SEQ ID NO:2, 

b) a naturally occurring polynucleotide comprising a polynucleotide sequence at least 
90% identical to a polynucleotide sequence of SEQ ID N0:2, 

c) a polynucleotide complementary to the polynucleotide of a), 

d) a polynucleotide complementary to the polynucleotide of b), and 

e) an RNA equivalent of a)-d). 

29. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a 
polynucleotide of claim 28. 
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ABSTRACT Pairwise sequrnce comparison methods have 
been %%%wtA using protcios whose relatioosfaips are koown 
rtUafoW from their structures and fiiDctioos, as described in 
the SCOP database [Mureiti, A. BrtDcer. S. Hubbard. T. 
& Cbothia C. (1995)/ Afo/. Biol. 247, 536-540]. The evalua. 
lion tested the programs BLAST (AiUchiii. S. Gish, W;, 
Miller, Myers, E. W. & Lipaan, D. J. (1990)./ Mol Biol. 
215, 40>-4101, wu-BUSn (AltscfauU S, F. & Gisb. W. (1996) 
Methods BnzymoL 266, 460-480], facta FPearsou, W. VL & 
Upmaa,D. J.(1988)/'fvc.AdXf.Xe0<f.5ci. USA. 85,2444-2448], 
and sseaRCH [Smith. T. F. & Waterman, M, S. (1981) / Mol 
Biol. 147, 195-197] and their scoring schemes. The error rate 
of all algorithms is greatly reduced by using statistical scores 
to evaluate matches rather than percenuge identity or raw 
scores. The £-value statistical scores of SSEARCH and Fasta are 
reliable: the number offaise positives found in our tests agrees 
well with the scores reported. However, the P-values reported 
by BLAST and W-blast2 exaggerate significance by orders of 
magnitude, ssearch, FaSTa ktup = 1, and wu-biast; perform 
best and they are capable of detecting almost alt relationships 
between proteins whose sequence identities are >30%. For 
more distantly related proteins, they do much less well; only 
one*half of the relationships between proteins with 20-30^e 
identity are found. Because many homologs have low sequence 
similarity, most distant relationships cannot be detected by 
any pairwise comparison method; however, those which are 
tdentiFied may be used with conHdence. 

Sequence database searching plavs a role in virtually evcrv 
branch of molecular biology and is crucial for imerpreting the 
sequences issuing forth from genome projects. Given the 
method s central role, it is surprising that overall and relative 
capabilities of different procedures are largely unknown It is 
difficult to verify aigorithnis on sample "^data because this 
requires large data sets of proteins whose cvolutionarv rela- 
tionships are known unambiguously and independent tv of the 
methods being evaluated. However, nearly all known ho- 
mologs have been identified by sequence analysis (the method 
to be tested). Also, it is generally very difficult to know, in the 
absence of structural data, whether two proteins that lack clear 
sequence similarity arc unrelated. This has meant thai al- 
though previous evaluations have helped improve sequence 
comparison, they have suffered from msufficient, imperfcctiv 
characterized, or artificial test data. Assessment also has been 
problematic because high quality database sequence searchinE 



Sequence comparison methodologies have evolved rapidly 
so no previously published tests has evaluated moden^ versions 
of programs commonly used. For examole. parameters in 
BiAST (1) have changed, and wu-blast- (2)-Mvhich produces 
gapped alignments— has become available. The latest version 
of FAST A f3) previously tested was 1.6, but the current release 
(version 3.0) provides fundameniaDy different results in the 
form of statistical scoring. 

The previous reports also have left gaps in our knowledge 
For example, there has been no published assessment of 
thresholds for scoring schemes more sophisticated than per- 
centage idenuty. Thus, the widely discussed statistical sconne 
measures have never actually been evaluated on large data- 
bases of real proteins. Moreover, the different sconng schemes 
commonly in use have not been compared. 

Beyond these issues, there is a more fundamental question- 
m an absolute sense, how well does pairwise sequence com- 
panson work? That is. what fraction of homologous proteins 
can be detected usmg modem database searchmg methods? 

in this work, we attempt to answer these questions and to 
overcome both of the fundamental difficulties that have hm- 
dered assessment of sequence comparison methodologies. 
First, we use the set of distant evolutionary relationships m the 
SCOP: Structural Classification of Proteins database (4). which 
is derived from structural and functional characteristics (5) 
The SCOP database provides a uniquely reliable set of ho^ 
mologs. which are known independently of sequence compar- 
ison. Second, we use an assessment method that jomiiv mea- 
sures both sensitivity and specificity. This method 'allows 
straighrforward comparison of different sequence searchmg 
procedures. Further, it can be used to aid interpretation of real 
database searches and thus provide optimal and reliable 
results. 

Previous Assessments of Sequence Comparison. Several 
previous studies have cxammed the relative performance of 
different sequence comparison methods The most encom- 
passing analyses have been by Pearson (6. 7). who compared 
the three most commonly used programs. Of these, the Smith- 
Waterman algorithm (8i impiemenicd in ssearch (3) is the 
oldest and slowest but the most rteorous. Modem heuristics 
have provided blast (1) the speed and convenience to make 
it the most popular program. Inicrmediaie between these two 
IS FAST A (3), which may be run m two modes offcrme either 
greater speed (ktup = 2) or greater effectiveness (ktup = 1). 
Pearson also considered different parameters for each of these 
programs 



attempts to have both sensliiviiy (oeteaion ol homologs) and 'he methods. Pearson selected two represent at ivc 



specificity (rejcaion of unrelated proteins); however, these 
complement ary goals are l inked such thai increa^mp n 
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proteins from each of 67 protein supcrfamilies dcfmcd bv the 



"daiahase-and the matched proiems were marked as being 
homologous or unrelated according to their membership of PIR 
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superf amities. Pearson found thai modem matrices and "In- 
scaling" of raw scores improve results considerabty. He aiso 
reported that the rigor us Smith- Waterman algorithm woriced 
slightly better than fasta, which was in turn more effective 
than BLAST. 

Very large scale analyses of matrices have been performed 
(10), and Henikoff and Henikoff (11) aiso evaluated the 
effectiveness of BLAST and fasta. Their test with blast 
considered the abiliry to detect homologs above a predeter- 
mined score but had no penalty for methods which also 
reported large numbers of spurious matches. The Henikoffs 
searched the swtss-PROT database (12) and used prosfte (13) 
to defme homologous families. Their results showed that the 
aLOSUM62 matrix (M) performed markedly better than the 
extrapolated FAM-series matrices (15), which previously had 
been popular. 

A crucial aspect of any assessment is the data that are used 
to test the ability of the program to fmd homoloes. But in 
Pearson s and the Henikoffs' evaluations of sequence com- 
parison, the correct results were effeciiveiy unknown. This is 
because the superfamilies in PIR and PROsrrE are principally 
created by using the same sequence comparison methods 
which are being evaltiated. interdependency of data and 
methods creates a ^'chicken and egg'* probletn. and means for 
example, that new methods would be penalized for correctly 
identifying homoloes missed by older programs. For instance, 
immunoglobulin variable and constant domains are clearly 
homologous, but pir places them in different superfamilies. 
The problem is widespread: each supcrfamiiy in pir 48.00 with 
a structural homobg is itself homologous to an average of 1.6 
other PlR superfamilies (16). 

To surmount these sorts of difficulties. Sander and Schnei- 
der (17) used protein structures to evaluate sequence com- 
parison. Rather than comparing different sequence compari- 
son algorithms, their work focused on determining a length- 
dependent threshold of percentage identity, above which all 
protems would be of similar structtire. A result of this analysis 
was the HSSP equation; it states that proteins with 25% identity 
over 80 residues will have similar structures, whereas shorter 
alignments require higher identity. (Other studies also have 
used structures (18-20), but these focused on a small number 
of model proteins and were principally oriented toward eval- 
uating alignment accuracy rather than homology detection.) 

A general solution to the problem of scoring comes from 
statistical measures (t.e.. £-values and P-values) based on the 
extreme value distribution (21). Extreme value scoring was 
implemented analytically in the blast program using the 
Karlin and Altschul statistics (22. 23) and empirical ap- 
proaches have been recently added to FaSTA and SSEarch. in 
addition to being heralded as a reliable means of recognizing 
significantly similar proteins (24. 25). the mathematical irac- 
lability of statistical scores "is a crucial feature of the blast 
algorithm" (1). The validity of this scoring procedure has been 
tested analytically and empirically (see ref. 2 and references in 
ref. 24). However, all large empirical tests used random 
sequences that may lack the subtle structure found within 
biological sequences (26, 27) and obviously do not contain any 
real homologs. Thus, although many researchers have sug- 
gested that statistical scores be used to rank matches (24. 25. 
28). there have beenjio. iargr Tigorous^xperiments~oirfaiolog' 
jcal data to determine the degree to which such rankings are 
superior. 

discovery that the structures of hemoglobin and myoglobin arc 
very similar though ihcir sequences are not (29). it has been 



is very probable that they have an evolutionary relationship 
though their sequence similarity may be low. 

The recent growth of protein siruaure mformition com- 
bined with the comprehensive evolutionary classification in 
the SCOP database (4, 5) have allowed us to overcome previous 
limitations. With these data, we can e\-aluaie the performance 
of sequence comparison methods on real protein sequences 
whose relationships are known conftdenth . The SCOP database 
uses structural information to recognize distant homologs. the 
large majority of which can be determuied unambiguously. 
These superfamilies, such as the globins or the immunoglobu- 
lins, would be recognized as related by the vast maionty of the 
biological community despite the lack of high sequence sim* 
Uantv. 

* 

From SCOP, we extracted the sequences of domains of 
proteins in the Protein Data Bank (pdb) (30) and aeated two 
databases. One (PDB90D-6) has domains, which were all <90^ 
identical to any other, whereas (pdb^od-b) had those <407e 
identical. The databases were created by first soning all 
protein domains in scop by their quality and making a list. The 
highest quality domain was selected for inclusion in the 
database and removed from the list. Also removed from the list 
(and discarded) were all other domains above the threshold 
level of identity to the selected domain. This process was 
repeated until the list was empty. The PDB40D-B database 
contains 1,323 domains, which have 9.D44 ordered pairs of 
distant relationships, or ^03% of the total 1,749.(X)6 ordered 
pairs- In PDBWD-B, the 2,079 domains have 53.988 relation- 
ships, representing 1.2% of all pairs. Low complexity regions 
of sequence can achieve spurious high scores, so these were 
masked m both databases by processing with the SEC program 
(27 ) using recommended parameters: 12 1.8 2.0. The databases 
used in this paper are available from hitp://sss.stanford.edu/ 
sss/. and databases derived from the current version of scoP 
may be found at http;//scop.mrc-lmb.cam.ac.uk/scop/. 

AnaK-ses from both databases were generally consistent, but 
PDBiOD-B focuses on distantly related proteins and reduces the 
heavy ovenepresentaiion in the pdb of a small number of 
families ('31. 32). whereas PDB90D-B (with more sequences) 
improves evaluations of statistics. Except where noted other- 
wise, the distan: homoiog results here are from PDB40D-B. 
Although the precise numbers reponed here are specific to the 
structural domain databases used, we expect the trends to be 
general. 

Assessment Data and Procedure. Our assessment of se- 
quence comparison may be divided into four different major 
categories of tests. First, using just a single sequence compar- 
ison alsorimm ai a iimc. we evaluated the effectiveness of 
different sconne schemes. Second, we assessed the reliability 
of scoring procedures, including an evaluation of the validity 
of siaiistica) scoring. Third, we compared sequence compari- 
son aigoritnms (using the opitmat scoring scheme) lo deter* 
mine their relative performance. Fourth, we examined the 
distribution of homologs and considered the power of patrwtse 
sequence comparison to recognize them. Alt of the analyses 
used the databases of structurally identified homologs and a 
new assessment criterion. 

The analvses tested BLArr (1), version 1.4.9MP. and wu- 
blast: f2j. version 2.0a 13MP. Aiso assessed was the fasta 
packaec. vers iQn-X0il6--C34T-whteh--provTdetf~FASTA ana the" 
sseaRCH impiemcntaiion of Smith-Waterman (8). For 
sseaRCH and f.a.sta. wc used blosum45 with gap penalties 



"StJM6: 1 we re-used for blast and wij- blast: 

The "Coverage Vs. Error*' PIolTo test a panicular protocol 



— appar^nMhat-comparmg srructures is a more powerful (if less 
convenient) way to recognize distant evolutionary reiaiion- 
ships than comparmg sequences. If two proteins show a high 
degree of similarity in their structural details and function, it 



4.corIlpr4smw-pT^^^arn--an^-sco^^g"Scheme). cach~sequcnce 
from the database was used as a query to search the database. 
This yielded ordered pairs of query and target sequences with 
associated scores, which were sorted, on the basis of their 
scores, from best to worst. The idea! method would have 
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Fig. 1. Coverage vi. error plots of differem scoring schemes lor sseakch Smiih-Waiermwi. {A) Antlysis of fow>d-b database. (B> Analysis 
of PDB90I>*B daubase. Alj of the proteins in the database were compared wuh each other using the sseaxch program. The results of this smgie 
set of comparisons were considered usmg five different sconng schemes and assessed. The graphs show the coverage and errors per puery (EPO) 
for statistical scores, raw scores, and three measures usmg percentage identir% . In the coverage vs. error ploL the x axis indicates (he fraction of 
all homologs in the database (known irom structure) which have been detected. Precise^, n is the number of detected pairs of proteins with the 
same fold divided by the total number of pairs from a common superfamily. pob40D-b contaitis a total of 9.044 homologs. so a score of \Q% indicates 
identification of 904 relationships. The v axis reporu the number of EPO. Because there are quenes made in the pdmid-s all-vs.-all 
comparison. 13 errors corresponds to O.OU or \% EPQ. The y axis is presented on a log scale to show results over the widely varying degrees of 
accuracy which may be desired. The scores that correspond to the levels of EPO and coverage are shown in Fig. 4 and Table 1 . The graph 
demonstrates the trade-off between sensitivity and seieciivtty. As more homologs are found (moving to the nghti. more errors are made ^moving 
up). The ideal method would be in the lower right comer of the graph, which corresponds to identifying many evolutionary relationships without 
selecting unrelated proteins. Three measures of percentage identity are ploned. Percentage identity wiihin alignment is the degree of identity wtthm 
the aligned region of the proteins, without consideration of the alienmeni length. Percentage identity within both is the number of identical residues 
in the aligned region as a percentage of the average length of the query and target proteins. The hssp equation (H) is H * 290.15/*"-**^ where 
/ IS length for 10 < / < 80: H > job for / < 10. H « 24.7 for / > 80. The percentage identiiv HSSP-adiusied score is the percent identity wuhin 
the aiignment minus H. Smuh-Waierman raw scores and £-vaiues were laxen directly from the sequence' comparison program 



perfect separation, with all of the homologs ai the top of the 
list and unrelated proteins below. In practice, perfect separa- 
tion is impossible to achieve so instead one is interested in 
drawing a threshold above which there arc the largest number 
of related pairs of sequences consistent with an acceptable 
error rate. 

Our procedure involved measuring the coverage and error 
for everv threshold. Coverage was defined as the fraction of 
structurally determined homologs that have scores above the 
selected threshold; this reflects the sensitivity of a method. 
Errors per query fEPQ). an indicator of seiectiviiy. is the 
number of nonhomologous pairs above the threshold divided 
by the number of queries. Graphs of these data, called 
coverage vs. error plots, were devised to understand how 



protocols compare at different levels of accuracy. These 
graphs share effectively all of the beneficial features of Re- 
cievcr Operating Characteristic (ROC) plots (33. 34) but 
better represem the high degrees of accuracy required in 
sequence comparison and the huge background of nonho- 
mologs. 

This assessment procedure is directly relevant to practical 
sequence database searching, for it provides precisely the 
information necessary to perform a reliable sequence database 
search. The EPQ measure places a premium on score consis- 
tency; that is. it requires scores to be comparable for different 
queries. Consistency is an aspect which has been largely 
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globin 0-chain (poa code Ihds chain b. ref. 38, Left) and cellulase £2 
(PDB code Uml. rcf. 39. Right) have 39% identity over 64 residues, a 
Jcyciwhichas4>ttcn-bct»eved-to^^ Dcspitcthis 
high degree of identiiv. their sirucmres strongly suggest that these 
proteins are not related. Appropriately, neither the raw alignment 
score of 85 nor the E-vaiuc of 1.3 is significant. Proteins rendered by 
KASMOL (40). 



FlO. 3 Length and percenuge identiiy ol itlignmenis of unrelated 
proteins rn pdb«od-B: Each pair oi nonnomologou s proteins foun ^jvitb- 
.5S£ABCW-44-pl0He^-as-a-poTnrwTrose position indicaici the length and 
tnc pcrccnuec tdcmity wuhtn the aiienmcni Because alignment 
icngtn and percentage ideniiiy arc quaniizcd. many pairs ot proteins 
may have exactly the same aiianmeni length and percentage idenuty. 
The line shows the hssp tnreshotd (though it is intended to be apphed 
With a different matrix and parameters) 
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Fig. 4. Reliability of suustical scores m pdbwd-b: Each line shows 
the rclaiioEiship becween rtponcd suiistical score and actual error 
rate for a different program. £*vatues are reported for ssEarch and 
fasta. whereas P-vaiues are shown for biaST and wi;.BLAm. if the 
scoring were perfect, then the number of errors per query and the 
E-vaiuej would be the same, as indicated bv the upper bold line. 
fP-values should be the same as EPO for small' numben. and diverges 
at higher values, as indicated by mc lower bold line.) E-vaiue$ from 
SSEARCH and facta arc snown to have good agreement with EPO'but 
underestimate the significance slightly, blast and wu-eiAsn arc 
overconfident, with the degree of exaggeration dependent upon the 
score. Tne results for PDB40D-B were similar to those for pdbwd-b 
despite the. difference in number of homologs detected. This graph 
could be used to roughly calibrate the reliabiiitv of a given staiisucal 
score. 

ignored in previous tests but is essential for the siraiehtfor*ard 
or automatic inierprctation of sequence comparison results. 
Further, it provides a clear indication of the confidence thai 
should be ascribed to each match. Indeed, the EPQ measure 
should approximate the expectation value reponed by data- 
base searching programs, if the programs* esiimaies are accu- 
rate. 

The Pcrforraaiice of Scormg Schemes. All of the programs 
tested could provide three fundamental types of scores. The 
first score is the percentage identity, which mav be computed 
m several ways based on cither the length of the alignment or 
the lengths of the sequences. The second is a "raw" or 
**Smiih-Waierman** score, which is the measure optimized by 
the Smith-Waterman algorithm and is computed bv summmg 
the subsmuiion matrix scores for each position in' the align- 
ment and subtracting gap penaliies. In blast, a measure 

Smmmhcc Compsrtaon Ateormima (PDB40D>B) 



related to this sc re is scaled int bits. TTjird is a watisiial 
score based on the cnremc value dtstributt n. These results 
are summarized in Fig. 1. 

Sequeoce Idcotin. Though ii has been lone established that 
percentage ideniny is a poor measure (35). there is a common 
rulc-of-thumb stating that 10% identirx- signifies homology. 
Moreover, publications have indicated ihat"25^ identirv can 
be used as a threshold (17. 36). We find that these thresholds, 
ongmally dcnved years ago. are not supported bv present 
resulu. As databases have grown, so have the possibilities for 
chance alignments with high identity: thus, the reported cutoffs 
lead to frequem errors. Fig. 2 shows one of the manv pairs of 
proteins wuh very different structures that noncihcicss have 
high levels of Identity over considerable aliened regions. 
Despite the high identity, the raw and the statistical scores for 
such incorrect matches arc typically not sienificani. The prm- 
cipal reasons percentage identity does so'poorlv seem to be 
that n Ignores miormaiion about gaps and about the conser- 
vative or radical nature of residue substitutions. 

From the pdbmd-b analysis in Fig. 3. we learn that 30?f 
Identity is a reliable threshold for this database oniv for ' 
sequence alignments of at least 150 residues Because one 
unrelated pair of proteins has 43 J^c ideniitv over 62 residues. 
It is probably necessary for alignments to be ai least 70 residues 
m length before ^% i$ a reasonable threshold, for a database 
01 this particular size and composition. 

At a given reiiabiiit>\ scores based on percentage identiiv 
detect just a fraction of the distant homologs found bv 
statistical scoring. If one measures the percentaec identity i^i 
the aligned regions without consideration of alienment length 
then a negligible number of distant homoloes^ are detected' 
Use of the HSSP equation improves the value of pcrccniage 
identity, but even this measure can find onlv A% of all known 
homologs at iTc EPQ. In short, percentage identirv- discards 
most of the. information measured in a sequence comparison. 

Raw Scores. Smith-Waterman raw scores perform belter 
than perceniaec identity (Fig. 1 ). but In-scahng (7) provided no 
notable benefit in our analysis. It is necessary to be verv precise 
when usmg either raw or b\\ scores because a l^^c change in 
cutoff score could yield a tenfold difierence in EPQ. However, 
n IS difficult to choose appropriate thresholds because the 
reliability of a bii score depends on the lengths of the proteins 
matched and the size of the database. Raw score thresholds 
also arc affected by mairuc and gap parameters 

Statistical Scores. Siaiisiical scores were introduced partiv 
to overcome the problems thai arise from raw scores This 
scoring scheme provides the best discrimmaiion between 
homologous proteins and those which are unrclaicd. Most 
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likely, iu power ctn be anribuied lo its incorporation f more 
information than any ther measure: ii talces account of the 
full substitution and gap data (lilce raw scores) but also has 
details about the sequence lengths and compositi n and is 
scaled appropriately. 

We find that statistical scores are n t only powerful, but also 
easy to interpret, ssearch and fasta show close agreement 
between statistical scores and anual number of errors per 
query (Fig. 4). The expectation value score gives a good, 
slightly conservative estimate of the chances of the two se- 
quences being found ai random in a given query. Thus, an 
£*value of 0.01 indicates that roughly one pair of nonhomoiogs 
of this similarity should be found in every 100 different queries. 
Neither raw scores nor percentage identity can be interpreted 
in this way, and these resulu validate the suitability of the 
extreme value distribution for describing the scores from a 
database search. 

The P-vaiues from blast also should be directly interpret- 
able but were found to overstate significance by more than two 
orders of magnitude for 19c EPO for this database. Nonethe- 
less, these results strongly suggest that the analytic theory i$ 
fundamentally appropriate. wu-BLast: scores were more re* 
liable than those from blast, but also exaggerate expected 
confidence by more than an order of magnitude at I9r EPQ. 

Overall DetectioD of Homologs and Comparisou of Algo- 
ritbms. The results in Fig. SA and Table 1 show that pairwtse 
sequence comparison is capable of identifying only a small 
fraction of the homologous pairs of sequences in pob40D-b. 
Even SSEARCH wiih E-vaiues. the best protocol tested, could 
find only 18^ of all relationships ai a 1^ EPQ. BiAST. which 
identifies 159c. was the worst performer, whereas fasta 
ktup ~ 1 is nearly as effective as ssearch. FAST a ktup * 2 and 
wu-blast: are intermediate in their ability to detect ho- 
mologs. Comparison of different algorithtns indicates that 
those capable of identifying more homologs are generally 
slower. SSEARCH is 25 times slower than BLAST and 6.5 limes 
slower than fasta ktup - 1. wu-blast: is slightly faster than 
FASTA ktup » 2, but the latter has more inierpretable scores. 

In PDB90D-B. where there are many close relationships, the 
best method can identify only 389c of structurally known 
homologs CFig. 5B). The method which finds that many 
relationships is wu-fiLAST7. Consequently, we infer that the 
differences between FAST a kup » 1. SSEarch, and wu-blast: 
programs are unlikely to be significant when compared with 
variation in database composition and scoring reliabilirv. 

Fig. 6 helps to explam why most distant homologs cannot be 
found by sequence comparison: a great many such relation- 
ships have no more sequence identity than would be expected 
by chance, ssearch with E-values can recognize >909r of the 
homologous pairs with 30-409f identity. In this region, there 
are 30 pair^ of homologous proteins that do not have signif- 
icant E-vaiues. but 26 of these involve sequences with <50 
residues. Of sequences having 25-309*^ ideniit>. 75 9i- are 
identified by ssearch E-values. However, although the num- 
ber of homolop grows at lower levels of identit)'. the deieciion 
falls off sharply: only 409^ of homologs with 2^-25% identity 
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Fic. 6. Distribution and detection of hotnotogs in pdmo-b. Bars 
snow the distribuiton of homologous pm roMiiva according to their 
idemny (using the measure of identity in both). Filled regions indicate 
the number of these pairs found bv the best database searching method 
(SSEARCH with £-values) at \% EPO The PDMrvB database contains 
proteins with <409c identity, and as shown on this graph, most 
structurally identified homologs in the database have diverged es* 
tremely far in sequence and have <20^ identity. Note that the 
alignments may be inaccurate, especially at tow levels of identity. Filled 
regions show that ssEarcm can identify most relationships that have 
25% or more identity, but its detection wanes sharply below 25%. 
Conseouemiy. the great sequence dtvertertce of most structurally 
Identified evolutionary relationships efiecdvely defeau the ability of 
panwise sequence companson to detect them. 

are detected and only 10*?c of those with 15-209'c cart be found. 
These results show that statistical scores can find related 
proteins whose ideniirv- is remarkably low; however, the power 
of the method is restricted by the great divergence of many 
protein sequences. 

After completion of this work, a new version of pairwise 
BLAST was released: BLASTGP (37), li supports gapped align- 
ments, like wu'.BLASTt. and dispenses with sum siaiisiics. Our 
initial tests on blastgp using default parameters show that its 
E-valucs arc reliable and that its overall detection of homologs 
was substantially better than that of ungapped blast, but not 
quite equal to that of wu-blast:. 

CONCLUSION 

The Ecncrai consensus amongst experts (sec rets 7, ^4. 25. 27 
and references therein) suggests thai the most effective se- 
quence searches are made by (n using a large current database 
in which the protein sequences have been complexity masked 
and using statistical scores to interpret the results. Our 
expenmenis luH\ support this view 

Our results also suggest two further points. First, the E-val- 
ues reported b\ FaSTa and SSEARCH give lairlv accurate 
cstimaics of the siEnificancc oi each match, but the P-values 
provided bv blast and wu-blast: underestimate the true 



Table 1. Summary of sequence companson methods with pdb4QD-b 



Method 

SSEaKCH % identity: within alignment 
SSEARCH identity: whhm both 
SSEAACH % identity: HSSP'Scaled 

~77~""'IZ ^^^OTO^^^Watefmatt-raw^^^ 
SSEARCH E-values 

FASTA ktup 1 E-vatues 

— rW>n«up~s~lT^^vaUiM 

**i;-blast: P-values 
hLASr P-va(ues 



Relative Time' 




*Times are from large database searches with genome proteins. 
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1. An important feature of the work of many molecular biologists ts tdcntitvmg which 
genes are switched on and off in a cell under different environmental conditions or 
subsequent to xenobiotic challenge. Such information has manv uses, inciudmg the 
deciphering of molecular pathways and facilitatmg the development ot new experimental 
and diagnostic procedures. However, the student of gene hunting should be forgiven for 
perhaps becoming confused by the mountain of information avatiabie as there appears to be 
almost as many methods of discovering differentially expressed genes as there- are research 
groups using the technique. 

2. The aim of this review was to clarify the main methods of differential gene expression 
analysts and the mechanistic principles underlying them. Also included is a discussion on 
some of the practical aspects of using this technique. Emphasis is placed on the so*called 

open ' systems, which require no pnor knowledge of the genes contained within the study 
model. Whilst these will eventually be replaced by ' closed ' systems in the study of human, 
mouse and other commonly studied laboratory animals, they will remain a powerful tool for 
those examining less fashionable models. 

3. The use of suppresston-PCR subtractive hybridization is exemplified in the 
identification of up- and down-regulated genes in rat liver followmg exposure to pheno- 
barbital, a well-known inducer of the drug metabolizing enzymes. 

r 

4. Differential gene display provides a coherent platform for building libraries and 
microchip arrays of gene hngerprints' characteristic of known enzvme inducers and 
xenobiotic toxicants, which may be interrogated subsequentlv for tne identihcation and 
characterization of xenobiotics of unknown biological properties 
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Introduction 

It is now apparent that the development of almost all cancers and many non- 
neoplastic diseases are accompanied by altered gene expression m the atTccted cells 
co mplie d to their normal state t'Hunrcr 1991. W-vnford-Thoraas 1991. Voeclsiein 
and Kmzier 1993, Semenza 1994, Cassidy 1 995, Kleinjan and Van Hepnmgen 1998). 
Such changes also occur in response to external stimuli such as pathogenic micro- 
organisms (Rohn et aL 1996, Singh et at. 1997, Griffin and Krishna 1998, Lunncy 
1998) and xenobiotics (Sewall et ai, 1995, Dogra et at. 1998, Ramana and Kohh 
1998), as well as during the development of undifTcreniiated cells (Hecht 1998, 
Rudin and Thompson 1998, Schneider- Maunoury et aL 1998). The potential 
medical and therapeutic benefits of underst anding the molecular changes which 
-occtirin~3n7~gtven cell m progressing from the normal to the 'altered' state arc 
enormous. Such profiling essentially provides a /fingerprint' of each step of a 
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ceirs development cr response and should help in the elucidation of specific and 
sensitive biomarkers representing, for example, different types of cancer or previous 
exposure to cenain classes of chemicals that are enzyme inducers. 

In drug metabolism, many of the xenobiotic-metabolizing enzymes (including 
the well -characterized isoforms of c>tochromc P450) are inducible by drugs and 
chemicals in man (Pelkorien et al, 1998), predominantly involving transcriptional 
activation of not only the cognate cytochrome R450 genes, but additional cellular 
proteins which may be crucial to the phenomenon o: induction. Accordingly, the 
development of methodology' to identify and assess the full complement of genes 
that are either up- or down- regulated* by inducers are crucial in the development of 
knowledge to understand the precise molecular mechanisms of enzyme induction 
and how this relates to drug action. Similarly, in the field of chemical-mduccd 
toxicity, it is now becoming increasingly obvious that most adverse reactions to 
drugs and chemicals are the result of multiple gene regulation, some of which are 
causal and some of which are casually- related to the toxicological phenomenon per 
s€. This observation has led to an upsurge in interest in gene-profiling technologies 
which differentiate between the control and toxin-treated gene pools in target tissues 
and is, therefore, of value in rationalizing the molecular mechanisms of xenobiotic- 
induced toxicity-. Knowledge of toxin-dependent gene regulation m target tissues is 
not solely an academic pursuit as much interest has been generated in the 
pharmaceutical industry to harness this technology m the early identification of toxic 
drug candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. For example, if the gene profile 
in response to say a testicular toxin that has been well-characterized in vtvo could be 
determined in the testis, then this profile would be representative of all new drug 
candidates which act via this specific molecular mechanism of toxicity, thereby 
providing a useful and coherent approach to the early detection of such toxicants. 
Whereas it would-be informative to know the identity and functionality of all genes 
up/down regulated by such toxicants, this would appear a longer term goal, as the 
majority of human gene? -ave not yet been sequenced, far less their functionality 
determined. However, the current use of gene profiling yields a pattern of gene 
changes for a xenobiotic of unknown toxicity which may be matched to that of well- 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
eyrgnsivg toxicological cxaxnination. Such approaches are beginning to gam 
momcnnim, m that several biotccnnoiogy companies are commercially producing 
'gene chips* or 'gene arrays* that may be interrogated for toxicity assessment of 
xenobiotics. These chips consist of hundreds/ thousands of genes, some of which are 
degeneratcin the sense that not all of the genes arc mechanistically- related to any 
one toxicological phenomenon. Whereas these chips are useful in broad -spectrum 
screening, they are maturing at a substantial £55^* ^^i that gene arrays are now 
becoming more specific, e.g. chips for the identification of changes in growth factor 
families that contribute to the aetiology and development of chemically-induced 

neoplasiftSi: - _ \ 

Although documenting and explaining "these genetic changes presents a 
formidab le obstac le to unders tandin g the diffe rjeiujmech a n i s ms^o ij^- elggm^^ 
-^iseasc progression; t this difficult 

challenge. Indeed, several * diffe rential expression analysis - methods have been 
developed which facilitate the identification of gene products that demonstrate 
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altered expression m cells of one population compared to another. These methods 

gene expression in many situations, including 
invading pathogenic microbes (Zhao et ai 1998). in cells responding to extracellular 
and intracellular microbial invasion (Duguid and Dinauer 1990, Ragno et aL 1997, 
Maldarelh et al. 1998), in chemically treated cells (Syed et aL 1997. Rockett et al. 
1999), neoplastic cells (Liang et al. 1992, Chang and Terzaghi-Howe 1998), 
activated cells (Gurskaya et al. 1996. Wan et al. 1996), differentiated cells (Hara et 
al. 1991, Guimaraes et al. 1995a, b), and different cell rypes (Davis et aL 1984. 
Hedrick et al. 1984, Xhu et al. 1998). Although differential expression analysis 
technologies are applicable to a broad range of models, perhaps their mosi important 
advantage is that, in most cases, 'absolutely no prior knowledge of the specific genes 
which are up- or down- regulated is required. 

The field of differential expression analysis is a large and complex one. with 
many techniques available to the potential user. These can be categorized into 
several methodological approaches, including: 

( 1 ) Differential screening. 

(2) Subtractive hybridization (SH) (includes methods such as chemical cross- 
linking subtraction — CCLS, suppression-PCR subtractive hybridization — 
SSH, and representational difference analysis — RDA), 

(3) Differential display (DD), 

(4) Restriction endonuclease facilitated analysis (including serial analysis of gene 
expression — SAGE — and gene expression fingerprinting — GEF), 

(5) Gene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

The above approaches have been used successfully to isolate differentially 
expressed genes m different model systems. However, each method has its own 
subtle (and sometimes not so subtle) characteristics which incur various advantages 
and disadvantages, .Accordingly, it is the purpose of this review to clarify the 
mechanistic principles underlying the main differential expression methods and to 
highlight some of the broader considerations and implications of this very powerful 
and increasingly popular technique. Specifically, we will concentrate on the so- 
called *open' systems, namely those which do not require any knowledge of gene 
sequences and, therefore, are useful for isolating unknown genes. Two 'closed' 
systems <thosc utilising previously identified gene sequences). EST analvsis and the 
use of DNA arrays, will aiso" be corwidtercd bncfiy for comoictencas. Whilst 
emphasis will often be placed on suppression PGR subtractive hybridization (SSH. 
the approach employed m this laboraton.*), it is the aim of the authors to highlight, 
wherever possible, those areas of common interest to those who use, or intend to use, 
differential gene expression analysis. - - - 
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Differe ntial cDNA library s^ retmiix^DS^ — 

Despite the development of multiple technological advances which have recently 
brought the field of ge^>e^p^ sion^ig^iggj^he^fc ^£i^£.^l^ 

rom5ic?of3i^mata^^ and characterization of 



differentially expressed genes has existed for many years^_0ne„of._the™orig4nal 
_appjoaches-use^-^04tienTif>'~such~gcncFw^ 20 years ago by St John and 

Davis (1979). These authors developed a method, termed 'differential plaque filter 
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hybncl«at.on . wh.ch was used to isolate galact sc-inducible DNA sequence, from 
yeast. The theory « simple: a genomic DNA librarv is prepared from n n^aT 
^enaTed -S 1 -d multiple filter replicas ar 

c Tl« " K " """^'"^ ^''^ radioactively (or otherw ise) Ubelled 

^RMA ''i^l" ^ «d «" niRNA populations. 

Those mRNAs which are differentially expressed in the treated cell population ^\\ 

FunhtZre'LbTi' n\:T P™^'*^ ^^^^'^ Treated cells 

Funhermore. labelled cDNA from different test conditions can be used to probe 

7^^6 1' '"''^'^^ identification of mRNAs which are onlv ^p! 

eirir^r ' conditions. For example. St John and Dav.s (1979) scree«d 

Tenl -n Jn 7 ^r^^n":^'""''' «^-<^'osc.dcrW,6 probes m order to obta n 
t?rthifme?hoT " ' -"''bol'-- Although groundbreaking i"" 

Zlt . insensitive and time-consummg, as up to 2 

months are required to complete the identification of genes which are differential' 
expressed in the test population. In addition, there is no convenient wav to chS 
that the procedure has worked until the whole process has been completed. 

Subtractive Hybridization (SH) 

The developmg concept of differential gene expression and the success of early 

eThtr " n979) soon gave rise ^ I 

search for more convenient methods of analysis. One of the first to be developed wa^ 
SH, numerous variations of which have since been reported (see below) In Teneral 

;o™mRN^^^^^^^^ of mRNA/cDN^A from one populitLn'S 

Uul "V^^^A/cDNA from another (driver), followed bv separation of the 
unhybndired tester fraction (differentially expressed) from the kvbridi^ed coLon 
sequences. This step has been achieved physically, chemicallv and through tJTus 
of selective polymerase chain reaction (PGR) techniques. 

Physical separation 

of K^K ^"'^i"'''""'''' hybridization technology- involved the phvsical separation 

0 al^re^l rT°" ^'"^^^ "^'-^^'^ methods 
of achie^^ tfaos have been oescribed. including hydroxvapante chxomatoeiaphv 

«d ohgodT-latex separation (Hara et al. 1991). In the first approach, common 
mRNA spec.es are removed by cDNA (from test cells)-mRN.:^ (from control celis^ 
subtractive hybridization foUowed by hydroxyapatite chromatography, as hydroxy 
apatite specifically adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA U 
then used either for the construction of a cDNA library of differennally expressed 
genes (Sargerit and Dawid 1983. Schneider et al. 1988) or d.rectlv as a probe to 

1 o»T " P'""'''"''^ l'*"-*^ (Zimmerman et al. 1 980, Davis et al. 1 984. Hedrick et al 
1V84). A schemanc diagram of the procedure is shown in figure 1. 



Less rigorous physical separation procedures coupled with sensitivity enhancing 
i'^K steps we re later developed as a means to overcome .om^ nf rk, prfrhlrmn 



(lyyu) described a method of subtraction utilizing biotin-afiinitv svstems as a means 



wwwm-aiiuiny systems as a means 

ttr-removc hybridized common sequences. In this process, both the control and 
tester mRNA populations are first convened to cDNA and an adaptor ('oligovector', 
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)r Coligovector*, 



or 

Produce clones Labei directly and probe library 

Figure 1. The hvdroxvap»titc method oi' iubtractivc hybridization. cDNA derived from the 
treated /aiiered i tester t popuianon is mixea wiin a iaree excess or mRNA from me control (dn%*erk 
popuUnon- FoUowmg hvbndnaaon. mRNA-cDNA hvbnds are removed bv nvdnaxvaotnte 
chromaiogmphy. The oniy cDNAs wrucn remain are rnose wnicn arc ainerenttaiiv expressed m 
the treated /altered population. In oraer to taciiitate the recoven- or fuJl length clones, smalt cDNA 
fragments are removed by exciusion cnromaiography. The remaining cDN As are then cloned mto 
a vector for sequencing, or laoeilcd and used directly to probe a librarv-, as described bv Sargent 
and Dawtd (1983). 

containing a restriction site) ligatcd to both sides. Both populations arc then 
amplified by PCR, but the driver cDN A po pyiatioii-is-subse-Quen^tv-dtgesre'd^ 
the-adaptor-contaimng restriction cndonuclease. This serves to cleave the oligo* 
vector and reduce the amplification potential of the control populajjoaJTbeuiig csted^ 



Following denaruration and hybridization, the mix is appHed to a biocytin column 
^smptavidin_^a„y__abo— be-^jsed^— to— rernove^ population, including 

hetcrodupiexes formed by annealing of common sequences from the tester 
population. The procedure is repeated several times following the addition of fresh 
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control cDN'A. In order to funher enrich those species differentially expressed in 
the tester cDNA, the subtracted tester population is amplified by PCR following 
every second subtraction cycle. After six cycles of subtraction (three reamplincation 
steps) the reaction mix is ligated into a vector for funher analysis. 

In a slightly different approach, Hara et ai (1991) utilized a method whereby 
oligo(dT3o) primers attached to a latex substrate are used to first capture mRXA 
extracted from the control population. Following 1st strand cDNA synthesis, the 
RNA strand of the heteroduplexes is removed by heat dcnaturation and centn- 
fugation (the cDNA-oligoiex-dTjo forms a pellet and the supernatant is removed). 
A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess). After several rounds of 
hybridization the only mRNA molecules left in the tester mRNA population are 
those which are not found^in the driver cDNA-oligotex-dTjo population. These 
tester-specific mRNA species are then convened to cDNA and, following the 
addition of adaptor sequences, amplified by PCR. The PCR products are then 
ligated into a vector for funher analysis using restriction sites incorporated into the 
PCR primers. A schematic illustration of this subtraction process is shown in figure 



However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large stanmg amounts of mRNA, significant 
loss of material during the separation process and a need for several rounds of 
hybridization. Hence, new methods of differential expression analysis have recently 
been designed to eliminate these problems. 



•fitneved after 



A extfteted from the 
;dT oligonucieotidn 
j^aiion ia re p e at e dl y- 



^uUtion of mRNA it 
tream applicationi. u 




Chemical Cross*Linking Subtraction ( CCLS ) 

In this technique, originally described by Hampson et al. (1992), driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20: 1. The common 
sequences form cDNA:mRNA hybrids, leaving the tester specific species as single 
stranded cDNA. Instead of physically separating these hybrids, they are inactivated 
chemically using 2,5 diaziridinyl-l ,4-ben2oqujnone (DZQ). Labelled probes are 
then synthesized from the remaining single stranded cDNA species (unrcactcd 
mRNA species remaining from the driver are not converted into probe material due 
to specificirv* of Sequenase TT DNA poK-merase used to make the probe) and used 
t03creeti-a cDNA library made from the ttstcr cell population. A schematic diagram 
of the system is showTi in rieure 3. 

It has been shown that the differentially expressed sequences can be enriched at 
least 300-fold with one round of subtraction (Hampson et al. 1992), and that the 
technique should allow isolation of cDNAs derived from transcripts that are present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see table 1). The main advantages of the CCLS approach arc that it is 
rapidt technically simple and also produces fewer fals r posirivrs than other 
differential expression analysis methods. However, like the physical separation 
protocols, a major drawback with CCLS is the large amount of starting material 



refined so that a renewable source of RNA can be generated. The degenerate random 
oligonucleotide prarned-^^flQ^)-ad^^ptati^n--(^4ampsorr~gt^fr±9^^ Hampson and 



Hampson 1997) uses random hcxanucleotide sequences to prime solid phase- 
synthesized cDNA. Since each primer includes a T7 polymerase promotor sequence 
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Figure 3. Chcmica} cross-linking subtraction. Excess driver mRNA is rruxed with 1*^ strand tester 
cDIikA. The common tequenccs torm miOhA:cDNA hyondi umcn are cro&s ixoiLco with 2.5 
diuuidmri-) .■^-benzoqumone tDZQ) and mr remaining cDNA ^eoiiences are dinerenti&ily 
expretsed m the tester population. Probes are made from these sequences usmg Sequenase 2.0 
DNA polymerase, which lacks reverse transcriptase acnviry and. therefore, docs not react with the 
remaining mRNA molecules from the driver. The labelled probes a*c then used to screen a cDNA 
librmry for clones of dif^eremially expressed sequences. Adapted trum Walter *t al. (1996), w«h 

~~ "" pel iiuaaiori. ' — " ■ - ■ • ■ 



Table 1. The abundance of mRNA species and classes in a rypicaJ mammalian cell. 



mRNA 
class 



"Copies ot ' No. of mR NA M ean '^^^^of 
each species in each species 

species/cell class m class 



Mean mass 
(ng) of each 
species/pg 
total RNA 




- Modified from Bertioli ft ai (1995). ~ 
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at the 5' end, the final pool of random cDN A -fragments is a PCR-rencwablc cDNA 
population which is representative of the expressed gene pool and can be used to 
synthesize sense RNA for use as driver material. Funhermore, if the final pool of 
random cDNA fragments is reamphned using biotinylaied T7 primer and random 
hexamer, the product can be captured with streptavidin beads and the amiscnsc 
strand eluted for use as tester. Since both target and driver can be generated from 
the same DROP product, subtraction can be performed in both directions (i.e. for 
up- and down-regulated species) between two different DROP products. 

Representational Difference Analysis fRDAj 

RDA of cDNA (Hubank and Schatz 1994) is an extension of the technique 
originally applied to genomic DNA as a means of identifying differences between 
two complex genomes (Lisitsyn et ai. 1993). It is a process ot subtraction and 
amplification involving subtractive hybndization of the tester in the presence of 
excess driver. Sequences in the tester that have homologues m the driver are 
rendered unamplifiablc, whereas those genes expressed only in the tester retain the 
abilir>* to be amplified by PGR. The procedure is shown schematically in figure 4. 

In essence, the driver and tester mRNA populations arc first converted to cDN A 
and amplified by PGR following the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor iigated to the 
amplified tester population only. Driver and tester populations are next melted and 
hybridized together in a ratio of 100: 1. Following hybridization, only tester : tester 
homohybrids have 5' adaptors at each end of the DNA duplex and can, thus, be filled 
in at both 3' ends. Hence, only these molecules are amplified exponentially during 
the subsequent PGR step. Although tester: driver hcterohybnds arc present, they 
only amplify in a linear fashion, smce the strand derived from the driver has no 
adaptor to which the primer can bind. Driver: driver heterohybrids have no 
adaptors and, therefore, are not amplified. Single stranded molecules are digested 
with mung bean nuclease before a further PGR-ennchment ot the tester: tester 
homohybrids. The adaptors on the amplified tester population are then replaced and 
the whole process repeated a further vwo or three times using an increasing excess of 
driver (Hubank and Shatz used a tester : driver ratio of 1:400. 1:80000 and 
1:800000 for the second, third and fourth hybridizations, respectively). Different 
adaptors are iigated to the tester berweer^ successive rounds of hvbndization and 
ameiification to prevent the accumulation of PGR products that rmcht mtertcre with 
subsequent ampiincatjons. The final display is a series of differentially expressed 
gene products easily observable on an ethidium bromide gel. 

The main advantages of RDA are that it offers_a reproducible and sensitive 

approach to the analysis of differentially expressed genes. Hubank and Schatz (1994) 
reported that they were able to isolate genes that were differentially expressed in 
substanrially.less than 1 % of the cells from which the tester is derived. Perhaps the 
main drawback is that multiple rounds of ligation^h ybridization^ am^ 




tion arc^equircd. The procedure is» therefore, lengthier than many other 
differential display approaches and provides more opporTunir>' for opcrator^induccd^ 



^^^^^s^^S^beerraotedv this- has 



)5 
M 
K)2 



beensblved^to some degree by O'Neill and Sinclair (1997) through the use of HPLC- 
purified ada gtors^JThese-^rx-fTee-of-the-mmcated-adapTa to 6e^ 

major source of the false positive bands. A very similar technique to RDA, termed 
linker capture subtraction (LGS) was described by Yang and Svnowski (1996). 
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Figure 4. The represenmion&l difTerence anaiysia (RDA) technique. Driver and tester cDNA are 
digested with a 4-cuttcr reatnciion enzyme such as Dfmll, The I** set of 12/24 adaptor strands 
(oligonucleotides) are ligaced to each other and the digested cDNA products. The I2mer is 
subsequently melted away and the 3*cnds Blled in using Taq DNA polymerase. Each cDNA 
population is then amplified using PCK. foUowmg which the 1* set of adaptors is removed with 

^ A w rn n *? of 12 / ? i s dffT ^r ** T pnd *-Tr^*"''»'^ft'i^*n ^^^^ ^n^rtitirri trrtrr rPNA 



population, after which the tester is hybridized against ~aTargc excess of driver. The t2mer 
adaptors are melted and the 3' ends filled in as before. -PGR is carried out with primers identical 

to^e^^^^^^^^^^^^^^^hc^oa^bjibridttaBo^^o^^^^^^^^^^^^^^^^g 
Smpkfeedeare^thoae^hich^e^icster : tester combm PGR" ssDNA products are 



removed with mung bean nuclease, leaving the 'first difference product'. This is digested and a 
third set of 12/24 adaptors added beforc-xepeAti n g t he aub tr^ctteft-^rocess-frgrrrthfe'ltv&ndiiitiQfr 



stage. The process is repeated to the 3"* or 4** difTerence- product, as described by Ltsttsyn et al. 
(1993) and Hubank and Schatz (IW). ' - '~ 



Differential gene expression 



00^ 



:DNA 



est and ligate 
r aaaotor 



on 



nd tester cDNA are 
2/24 adaptor strand$ 
ducts. The 12mer i$ 
. mrranr 1: ach-cP?4A- 



>tors is removed with 
iplined tester cDNA 



f dhvcr. The t2iner 



Suppression PCR Subtractive Hybridization ( SSHj 

The most rcccm adaptation of the SH approach to differential expression 
analysis was first described by Diatchenko et ai (1996) and Gurskaya et qL (1996i. 
They reponed that a 1000-5000 fold enrichment of rare cDNAs (equivalent to 
isolating mRNAs present at only a few. copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-based suppression system is used (see 
figure 5). 

In SSH. excess driver cDNA is added to two portions of the tester cDNA which 
have been ligated with different adaptors, A first round of hybridization serves to 
enrich differentially expressed genes and equalize rare and abundant messages. 
Equalization occurs since reanncaling is more rapid for abundant molecules than for 
rarer molecules due to the second order kinetics of hybridization (James and Higgms 
1 985). The xvfo primar>' hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complementary sequences which did not hybridize in the primar>* 
hybridization, and in doing so generates templates for PCR amplification. Although 
there are several possible combinations of the single stranded molecules present in 
the secondar>* hybridization mix, only one particular combination (differentially 
expressed in the tester cDNA composed of complimcntan,' strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, two options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PCR reaction into 
competent cells. Transformed colonies can then be isolated and their inserts 
characterized by sequencing; restriction analysis or PGR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reampiified 
and cloned. The first approach is technically simpler and less time consuming. 
However, ligation /transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, obser\*ations m this laboratorv' suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
in a higher number than others and this will be represented in the final population 
of clones. Thus, in order to obtam asubstanual proportion of those cene species that 
actually demonstrate differential exprcssionrin the tester popuianon. the number of 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more time consuming and technically demanding. However, it 
would appear to offer better prospects for cloning larger and low abundance gel 
products. In addition, one can incorporate a screening step that differentiates 
different products of different sequences but of the same size (HA-staining, see 
later). In this way, a good idea of the final number of clones to be isolated and 

identified can be achieved. ' 

~PiXi alternative (or even complementary) approach "is to use the final differential 




^ith pnmeT»Ta^mi 
ch are exponentially 
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and Wy- 14,643 (Rockett et al. unpublished obsen-'ations). The isolation of 
differentially expressed genes in this manner enables the construction of a fingerprint 
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Figtire 5. PCR-»etrct cDNA subtraction. In the primary Jiybridization. an excess of driver cDNA is 
added co each tester cONA population. The samples are heat denatured and allowed to hybridize 
for between 3 and 8 h. This serv es two purposes : Q jjoeguajiz-C- fSff an d a h i =m4 aht mole cttigrrmt 
-f 2) t o c nn ch~fbf~difTergnnaily expressed sequencer— cDN As. that are not diHercniially expressed 
fonn type c molecuies with the driver. In the secondary hybridization, the two prunary 
hybridizations are mixed together without denam ring. Fresh^ 

'afe^ortncdTirtftu^ which are subsequently amplified using two rounds of 

PCR. The final products can be visualaed on an ^iarQM^el^JabeUed_d^ 

jvec«jr-Jor-downstf*am-rnampuiati75rr^Ar'd^^ Diaichenko et al. (1996) and Guraicaya 



_ et al. (1996). with permission. 
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which arc dirTereniully expressed tn rat liver tollowmg srvon term exposure to the enrvme 
inducers, phenobarbitaj and Wy- 14.645. 
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of expressed genes which arc unique to each compound and timc/dose point. Such 
information could be useful in shon-term characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 



-pr odu c e d by known - mdu c ers. Figu re^^rshgws a flow diagram ot the method used to 
isolate, verify and clone differentially expressed genes* and figure 7 shows expression 
profiles obtained from a rv^^icaLS^H ^Yp< *nmrn i. i^tiHjift£|jA<*.^^^d^ ji T Ti^^ 



tnccs. Ty^~e~molecuier 
ied using rwo rounds of 
irectiy or-c4oned-into^ 
(1996) snd Gunkaya 



_^ndtv^idual=baTid s~s^rqi5encing^ r ro g at i o n reveals many genes 

which arc either up- or down -regulat ed by phenobarbital in the rat (tahles-2-and-3W- 
One^ofri'^e'ad vantages in using the SSH approach is that no prior knowledge is 
required of which specific genes are up /down- regulated subsequent to xenobiotic 
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Figure 7. SSH display patterns obtained from rat liver fotlowmg 3-day ireatmeni with \V^'- 14.643 or 
phenobarbital. mRNA extracted from control and treated livers was used to generate the 
differential displays using the PCR-Seiect cDNA subtraction kit (Clomech). Lane; 1 — Ikb 
ladder; 2 — genes uprcguiatcd following WyJ'i-643 treatment; 3 — genes downregulated following 
\Vy, 14-643 treatment; — genes urregulated following pbenobarbital treatment; 5 — genes 
downregulated following phenobarbital treatment; 6 — Ikb ladder. Reproduced from Rockcn et 
aL (1997), wuh permission, 

exposure, and an almost complete complement of genes are obtained. For example, 
the peroxisome proUferator and non-genotoxic hepatocarcinogen Wy,14»643. up- 
regulates at least 28 genes and down-regulates at least 15 m the rat (a sensitive 
species) and procuces 48 up- and 37 down-regulated genes in the guinea pig, a 
resistant species v Rockett, Swales, Esda and Gibson, unpublished obser\*ations). 
One of these genes, CD81, was up-regulated in the rat and down-regulated m the 
guinea pig following Wy- 14.643 treatment. CD81 (alternatively named TAPA-l) is 
a widely expressed cell surface protein which is involved in a large number of cellular 
processes including adhesion, activation, proliferation and dinerentiation (Levy et 
al. 1998). Since all of these functions are altered ro some extent m the phenomena 
ot hepatomegaly and non-genotoxic hrpatocarcmo genesis, it is intriguing, and 
probably mechanistically- relevant, that CD81 expression is differentially regulated 
in a resistant and susceptible species. However, the dowr-side of this approach is 
that the majority of genes can be sequenced and matched to database sequences^ but 
the latter are predominantly expressed sequence tags or genes of completely 
unknown function* thus partially obscunng a realistic overall assessment of the 
critical genes of genuine biological interest. N otwi thstanding the lack of complete 
funtional identification of altered gene expression, such gene prcr . ng studies 
essentially provides a 'molecular fingerprint* in response to xcnobior.c challenge, 
thereby ser ving as a mechanisticallv^^ rele-VRnr plarfofm — for— further— derailed 
mvestigations. 




Originally described as * RN A fingerprinting b>LatPiirarilyjpnmed-PCR^ 



— afvd-Pardee^991)n:liis method is now more -ommoniy referred to as 'differential 
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Clone 2 75.3% 
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Clone 1 95.2% 
Clone 2 93.6 
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CVP2B1 
Preproalbumin 
Serum albumin mR.V.A 
NCUCCAP-Prl H saotens (EST) 
CYP2B1 
CYP2BI 
CVP2B2 

TRPMO mRN.A 
Sulfated glycoprotein 
Preproalbumin 
Serum albumin mRN.A. 
CYP2B1 

Hapioglobuiin mRN.A panial alpha 
18S. 5.8S & 285 rRNa 



Banas l-^. 6 9. 13. 14, and I .-20 are shown to be false positives by dor biot anavi,« and, therefore, 
are not sequenced. Derived from Rockett « ci fl997). It should be noted that tne above genes do not 
represent the complete spectrum of genes which are up-reguiated m rat l.ver-bv phcnobarbital but 
simply represents the genes sequenced and idcntincd to date. 

Table 3. Genes down-regulated in rat liver foUowmg 3-day exposure to phenobarbual. 
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Soares p3NMF 10 5 M. musculus (EST) 
Soares mouse .\ML M. mtucuius (EST) 
NCl-CCAP-Prl H. sapiens (EST) 
Ribosomai protein 

Scam mouse emtjr».o NbMElj5 (EST'. 
Fibrinogen B-6eia-cnatn 
.Apoiipoprotcm E gene 
Soares p3NMFl9.5 M. musculus (EST) 
Stratagene mouse testis (EST) 
R. non^egicus R-ASP 1 mRNA 
Soares mouse mammarv' gland (EST) 



further detaiicd" 



EST - Expressed sequence tag. Bands ^ were shown to be false positives bv dot blot analysis and 
^erefore, were not sequenced. Denvcd from Rockett er d/. (1 997). It should be noted that the above gene^ 
do not represent the complete s pec t rum of ge n e s ^h i ch are down ■ r e gu late d m r a t hve r b v p heno feaTbitalr 
but aimiply represenu the genes sequenced and ideniiiicd to date. 



imed PCR* (Liang 
to as 'differential 



.display^rDD:)r4^i-this™eth^^^^ 

populations are amplified in separate reactions using_ reverse transcriptase- EC R 
fRT-P€R). The pl-oducis are t run sidc-by-side on sequencing gels. Those 
bands which are present in one display only, oi^ which arc much more mtense in ne 
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display compared to the other, are differentially expressed and mav be recover d for 
further characterization. One advantage of this system is the speedwith which it c« 
be earned out-2 days to obtain a display and as linle as a week to make and identi^ 

Two commonly used variations are based on different methods of primmg the 
irrhT "7"'P«°" "'■P 8). One is to use an oHgo dT w.th a :.base • anchor ' 

at the 3 .end. e.g. . (dT„)CA 3' (Liang and Pardee 1992). Altemat.velv. an 
«b trary pnmer may be used for 1st strand cDNA synthesis (Welsh et al. l"992). 

LZh ; O^J'dvantage of this second approach is that PCR products mav be 

( Won. .nd M rf n y^oof.' T P°'>-^^"y»"=<i- »«h as many bacterial mRNAs 
(Wong and McClelland 1994). In both cases, following reverse transcription and 
dena ruranon. second strand cDNA synthesis is carried out with an arbltrar^■ primer 
(arburary primers have a single base at each position, as compared to LdZ 
primers which contam a mixture of all four bases at each position). The resulting 

len«k ani T " "'"^""^ depending on the system (primer 

length and composition, polymerase and gel system), usuallv includes 50-100 
products per primer set (Band and Sager 1989). When a combination of different 
dT-anchors and arbitrary primers are used, almost all mRNA spec.es from a cell can 
...be amplified. W hen the cDNA products from two different populations are anal vsS 
hAtnro °" ^^l^ide gel. differences in expression can be identified'and 

the appropriate bands recovered for cloning and funher analvs.s. 

Although DD ,s perhaps the most popular approach used todav for identifving 
differentially expressed genes, it does suffer from several perceived disadvantages: 

^'^^ '"^^^'As (Bertioli et al. 

1993) although this has been disputed (Wan et al. 1 996) and the isolation of ver^• 

lri99?a) '"''"'''^ circumstances (Guimeraes et 

(2) The cDNAs obtained often only represent the e.xtreme 3' end of the mRNA 
often the 3 -untranslated region), although this mav not alwavs be the case 
(Gumieraes et al 1995a). Since the 3' end is often not included m Genbank and 
shows variation between organisms. cDN.As identified bv DD cannot alwavs be 
, !!!f*'^''"' '^^^^ even if they have been identified 

(.) The pattern of differential expression seen on the dispiav orten cannot be 

fSr^^/ lOoi^T*"'"* ^° 70 °o of cases 

(Sun « al. 1994). Some adaptations have been shown to reduce false positives 

mcludmg the use of two reverse transcriptases (Sung and Denman 1997)' 
cornparison of unmduced and induced cells over a time course (Bum et al \ 994) 
and c«nparison of DDPCR-producu from two uninduced and two induced 
. lines . Sompayrac et al. 1 995), The latter authors also reported that the use of 
cytoplasmic RNA rather then total RNA reduces false positive. fr ^„, 
nuclear KiNA that is not transported to the cytoplasm. 



Further details of the background, strengths and w..l,n..^.. .hr HP 



articles by Liang et al. (1995) and WaFet al. (1996)?- " 
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•AAAAAAAA 



{dT,i)CA: AC 




Artitrary pnmer. 



1*^ strand cDNA 

4 AC 



l«strana cONA 



UGAAAAAAA 



'AAAAAAA 



Denature and synthesise 2^ strand 
with any a-oitrary pnmer (— ) 



I 



2** strano cDNA 



AC 



2^ strand cDNA 
^ 



cDNA can now oe amotified by PCR using ongmai pnmer oair 

Figure 8. Two approaches to differential display (DD) analysis. 1" strand svnthests can be earned out 
either with a poiydTuNN primer (where N* = G. C or A) or with an arbitrary pnmer. The use o( 
different combinations of G, C and A to anchor the nrsi strand polydT pnmer enables the priming 
of the majoriry of polyadenvlated mRN As. Arbitrary primers may hybr\dne at none, one or more 
places along the length of the mRNA. allowing 1'* strand cDNA synthesis to occur at none, one 
or more points in the same gene. In both cases. 2"* strand synthesis is earned out with an arbitrary- 
primer. Since these arbitran- primers for the 2"" strand may also hybridize to the I*' strand cDNA 
in a number of difTerent piacea. several different 2*** strand products mav be obtained from one 
binding point of the I" strand pnmer . Followm? 2"" strand synthesis, the onjimal set of pnmcrs 
is uaed to anipiin- tne secona strand products, wun the resuit :nai numerous ^ene sequences arc 
amplmed. 



Restriction endonuclease-facilitated analysis of gene expression 

Serial Analysis of Gene Expression (SAGE) 

A more recent development in the field of difTcrcntiai display is SAGE analysis 
(Velculcscu et al. 1995). This method uses a different approach to those discussed so 
far a nd is based on two princi ples^— Firs t l y, in -mofe— than— 9f^^7r-o f c ases, sh ort 

10 



nocicotide sequences (*tag5-') of-only^ nine or 10 base pairs provide sufficient 
information to identify their gene of origin. Secondly, concat o nation J^inkin 



^ingiexlbnYfFigure~^ shows a schematic representation of the SAGE process. In this 
proccdure^_double _stranjied_xDNA-ir^m- the— test iicils' is syntHFsized ^wuh^ 
biotinylatcd polydT primer. Following -digestion with a commonly cuning (4bp 
recognition sequence) restricTibn enzyme ^anchoring enzyme*), the 3' ends of the 
cDN A population are caprvured with sircptavidin beads. The captured population is 
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split into two and different adaptors ligated to the 5' ends of each group. Incorp rated 
into the adaptors is a recognition sequence for a t\-pe IIS restriction enzme-^ne 
which-cuts DNA at a defined distance (< 20bp) from its recognition s quence. 
Hence, following digestion of each captured cDNA population with the IIS ent\ine. 
the adaptors plus a shon piece of the captured cDN.J^ are released. The two 
populations are then ligated and the products amplified. The amplified products are 
cleaved with the original anchoring enzyme, religated (concatomers are formed in 
the process) and cloned. The advantage of this system is that hundreds of gene tags 
can be identified by sequencing only a few clones. Furthermore, the number of times 
a given transcript is identified is a quantitative measurement of that genes 
abundance m the original population, a feature which facilitates identification of 
differentially expressed genes in different cell populations. 

Some disadvantages of SAGE analysis include the technical difficulr\- of the 
method, a large amount of accurate sequencing is required, biased towards abundant 
mRNAs, has not been validated in the pharmaco/toxicogenomic setting and has 
only been used to examine well known tissue differences to date. 



Gene Expression Fingerprinting (GEF) 

A different capmre/restriction digest approach for isolating differentially 

vanova and Belyavskv (1995), In this 
method, RXA is converted to cDNA using biotinylated oligo(dT) primers. The 
cDNA population is then digested with a specific endonuciease and captured with 
magnetic streptavidm microbeads to facihtate removal of the unwanted 5' digestion 
products. The use of restricted 3'-ends alone ser\'es to reduce the complexm- of the 
cDNA fragment pool and helps to ensure that each RNA species is represented by 
not more than one restriction product. .An acJaptor is ligated to facilitate subsequent 
amplification of the captured population. ?CR is earned out with one adaptor- 
specific and one biotinylated polydT pnmer. The reamplihed population is 
recaptured and the non-biotinylatcd strands removed by alkaline dissociation. The 
non-biotinylated strand is then resynthesized using a different adaptor-specific 
pnmer m the presence of a radiolabelled dNTP. The labeUed immobilized 3' cDNA 
ends are next sequentially treated with a series of different restriction endonucieases 
and the products from each digesuon anaiysed by PAGE. The result is a ringerpnnt 
composed of a number of ladders lequai to the number of sequenual digests used\. 
By comparing test versus control fingerprints, it is possible to identin- differentially 
expressed products which can then be isolated from the gel and cloned. The 
advantages of this procedure arc that it is very robust and reproducible, and the 
authors estimate that 80-93% of cDNA molecules are involved in the final 
fingerprint. The disadvantage is that polyacrylamide gels can rarely resolve more 
than 300-400 bands7 which compares poorly to' the 1000 or more which arc 
estimated to be produced in- an average experiment. The use of 2-D gels such as 
those desc ri bed bv Uitter linden_g/iflZ_lJ-9iia)-im^t:^^ j ) ^clp to 



overcome this problem. 

A similar method for displaying rest riction endonuc iease fraprm^nt^:^jwsuu4j 



digestion of thcimmobolized 3 j- terminal .cDNA^f ragmen ts, these author^smi^ 
__compar:fe^d_the-^-pr4>fiie^-^t-^^ further 
joianipulaiion. 
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Cleave with tagging enzyme (TE) ^ 
and produce Diunt ends 




GGATGCATCyOOOCXXXXX 
CCTACG7ACXX)WXXm 



GGATGCATGOOOOOOOOO 
CCT^iCGTACOOOOOOOOO 



TS AS 



Tag 
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Ugate and amplify 




GGATGCATGXXXXXXXXXOOOOOOOOOCATGCATCC 
CCTACGTACXXXXXXXXXOOOOOOOOOGTACGTAGG 



J 



DiTag 



AE 



AE 



Ciem wt#) AL isoias aiTags, 
concaienare. aone ano 
sequence 

AE 



— CA7GXXXXXXXXX00OOO0OO0CATG XXXXXXXXXCX)OOOOOOOCATG-^ 
— CTACWOXXXXXXOOOOOOOOOGTAC XXXXXXXXXOOOOOOOOOGTAC— 



Tag 1 Tag 2 



Tag 3 Tag 4 



Figure 9. Serial anal ysis of gene exprcaaion (SAGE) ana J van, gHNVA i<ripav#.i-i »rtth ->p -i r H^f'i in tf f n zy n if ^ 
(ABj ana the 3 ends captured using strepuvidin beads. TTie cDNA pool ts divided m half and each 
portion ligated to a difTereni linker, each containing a type IIS restriction site (tagging enzyme. 
TE). Rcstncnon with the Type IIS eruyme releases the hnkcr plus a short .length pf_c0N A . 
"(XXXXX-and OOOOO indicate nucleotides of different tags). The rwo pools of tags are ihen 
ligated and amplified using linker-specific pnmers. Following PGR, the products are cleaved with 

the AE and thedtfVS' isolated from the Imkers uaing PAGE. The ditags ire then ligated (during 

which process, concateniianon occurs) and cloned into a vector of choice for sequencing. After 
Vciculescu tt ai. (1995), with pcnmsaton.' * . 
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DNA arrays 

'Open* differential display systems are cumfaerson\c in that it takes a great deal 
of lime to extract and identify candidate genes and then connrm that they arc indeed 
up- or down-regulated in the treated compared to the control tissue. Normally, the 
latter process is carried out using Northern blotting or RT-PCR. Even so, each of 
the aforementioned steps produce a bottleneck to the ultimate goal of rapid analysis 
of gene expression. These problems will likely be addressed by the development of 
so-called DNA arrays (e.g. Gress et al. 1992. Zhao et ai 1995. Schena et ai 1996). 
the introduction of which has signalled the next era in differential gene expression 
analysis. DNA arrays consist of a' gridded membrane or glass chips' contammg 
hundreds or thousands of DNA spots, each consisting of multiple copies of part of 
a known gene. The genes are often selected based on previously proven involvement 
in oncogenesis, cell cycling, DNA repair, development and other cellular processes. 
They are usually chosen to be as specific as possible for each gene and animal species. 
Human and mouse arrays are already commercially available and a few companies 
will construct a personalized array to order, for example Clontech Laboratories and 
Research Genetics Inc. The technique is rapid in that hundreds or even thousands 
of genes can be sported on a single array, and that mRNA/cDNA from the test 
populations can be labelled and used directly as probe. When analysed with 
appropriate hardware and software, arrays offer a rapid and quantitative means to 
assess differences in gene expression between two cell populations. Of course, there 
can only be identification and quantitation of those genes which are in the array 
(hence the term * closed' system). Thcrefore> one approach to elucidating the 
molecular mechanisms involved in a particular disease/ development system may be 
to combine an open and closed svstem — a DNA arrav to directlv identify and 
quantitate the expression of known genes in mRNA populations, and an open 
system such as SSH to isolate unknown genes which are differentially expressed. 

One of the main advantages of DNA arrays is the huge number of gene fragments 
which can be put on a membrane — some companies have reported gridding up to 
60000 spots on a single glass 'chip' (microscope slide). These high density chip- 
based micro-arrays will probably become available as mass-produced ofT-thc-shclf 
items in the near future. This should facilitate the more rapid determination of 
differential expression in time and dose-response experiments. Aside from their 
high cost and the technical complexities involved m producmg and probing DNA 
arrays, the mam problem which remains, especiailv with the newer micro-array 
(gene-chip) technologies, is that results are often not wholly reproducible between 
arrays. However, this problem is being addressed and should be resolved within the 
next few years. 



EST databases as a means to identify differentially-expressed genes 

Expressed sequence tags (ESTs) are pa nial sequences of clones obta ined-from- 
— eP NA lib rarresr^ven though most £STs have no formal identity (putative 
identification is the best to be hoped for), they have proven to be a rapid and efficient 
means of discov^nj7g nw gen_cs_and-can-be^-used to generate- profi^ of gcne- 
expressiotfth specific cells. Since they~were ftr^t described by Adams et al. (1991), 
there has becna huge explosion in EST production and it is estimated that there arc 
now well over a million such sequences in the public domain, representing over half 
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of ail human genes (Hillier et al. 1996). This large number of fre ly available 
sequences (both sequence information and clones are normally available royalty -free 
from the originators) has enabled the development of a new approach towards 
differential gene expression analysis as described by Vasmatzis et at. (1998). The 
approach is simple in thcor>' : EST databases are first searched tor genes that have a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly of such sets of 
overlapping data may be developed in-house or obtained privately or from the 
internet. For example, the Institute for Genomic Research (TIGR. found at 
http://www.tigr.org) provides many software tools free of charge to the scientific 
community'. Included amongst these is the TIGR assembler iSutton et al. 1995), a 
tool for the assembly of large sets of overlapping data such as E5Ts, bacterial 
anificial chromosomes (BAC)s, or small genomes. Candidate EST clones repre- 
senting different genes are then analysed using RNA blot methods for size and tissue 
specificit>- and, if required, used as probes to isolate and identify the full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory' molecular studies. Vasmatzis et al. (1998) have described several 
problems iri this fledgling approach, such as separating highly homologous 
sequences derived from different genes and an overemphasis of specificity for some 
EST sequences. However, since these problems will largely be addressed by the 
development of more suitable computer algorithms and an increased completeness 
of the EST database, it is likely that this approach to identifying differentially 
expressed genes may enjoy more patronage m the future. 



Problems and potential of differential expression techniques 

The holistic or single cell approach ? 

When working with in vivo models of differential expression, one of the first 
issues to consider must be the presence of multiple cell r\-pes in any given specimen. 
For example, a liver sample is likely to contain not only hepatocytes. but also 
(potentially) Ito cells, bile ductule cells, endothelial cells, various immune cells (e,g. 
lymphocytes, macrophages and Kuprfer cells) and Hbrobiasts. Other tissues will 
each have their own disnncnve ccU popmanona. Also, in tne case oi neoplastic tissue, 
there arc almost always normal, hyperplastic and/or ayspiastic cells present m a 
sample. One must, therefore, be aware that genes obtained from a differential 
display experiment performed on an animal tissue model may not necessarily arise 
exclusively from the intended 'target' cells, e.g. hepatocytes/ neoplastic cells. If 
appropriate, further analyses using immunohistochemistry, in situ hybridization or 
in situ RT-PCR should be used to confirm which ceil types are expressing th 
gcnc(s) of interest. This problem is prob ably most acute for those studyin g_che, 
^tKf^^^'^^txt^a^expresslon o^ genes m the ^d^etopmenr of different cell types, where 
there is a need to examine homologous cell populations. The problem is now being 
addresscd^Mhe^anonaLCanc^r where new fnicro- 

disection techniques have been employed to assist in their gene analysis programme, 
the Cancer Genome Anatomy Pxojcct (CGAP.) fFor more information see web site : 
hrtp ;//www.ncbi. nlm.nih.gov/ncicgap/intro. html). There are also separation tech- 
"fiiques available that utilise cell-specific a"ntigens~as a means to isolate target cells. 
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e.g. fluorescence activated cell sorting (FACS) (Dunbar et al. 1998, Kas-Dceicn ei 
aL 1998) and magnetic bead technology (Richard et al. 1998. Roglcr et at. 1998). 

However, those taKing a holistic approach may consider this issue unimponani. 
There is an equally appropriate view that all those genes showing altered expression 
within.a compromized tissue should be taken into consideration. After all, smce all 
tissues are complex mixes of different, interacting cell t\-pcs which intimately 
regulate each other's growth and development, it is clear that each cell rv-pe could in 
some way contribute (positively or negatively) towards the molecular mechanisms 
which lie behind responses to external stimuli or neoplastic growth. It is perhaps 
then more informative to carr>' out differential display experiments using in vivo as 
opposed to in vitro models, where uniform populations of identical cells probably 
represent a panial, skewed or even inaccurate picture of the molecular changes that 
occur. 

The incidence and possible implications of inter-individual biological vanation 
should be considered in any approach where whole animal models are being used. It 
is clear that individuals (humans and animals) respond in different ways to identical 
stimuli. One of the best characterized examples is the debnsoquine oxidation 
poiyTOorphism, which is mediated by cytochrome CYP2D6 and determines the 
pharmacokinetics of many commonly prescribed drugs (Lennard 1993, Meyer and 
Zanger 1997). The reasons for such differences are varied and complex, but allelic 
variations, regulator>' region polymorphisms and even physical and mental health 
can all contribute to observed differences in individual responses. Careful thought 
should, therefore, be given to the specific objectives of the study and to the possible 
value of pooling starting material (tissue/ mRNA), The effect of this can be 
beneficial through the ironing out of exaggerated responses and unimportant minor 
fluctuations of (mechanistically) irrelevant genes in individual animals, thus 
providing a clearer overall picture of the general molecular mechanisms of the 
response. However, at the same time such minor variations may be of utmost 
importance in deciding the abilit\' of individual animals to succumb to or resist the 
effects of a given chemical /disease. 



Hou' efficient are differential expression techmqius at recovering a high percentage of 
differentially expressed genes ? 

A number of groups have produced experimental data suggesting that mam- 
malian cells produce between 8000-15 000 different mRNA species at any one time 
(Mechler and Rabbirts 1981, Hedrick et al, 1984, Bravo 1990), although figures as 
high as 20-30000 have also been quoted (Axel et al. 1976). Hcdnck et at. (1984) 
provided evidence suggesting that the majority of these belong to the rare abundance 
class. A breakdown of this abundance distribution is shown in table 1 . 

W Vt^n fh# results of differeniial-dwplay-cxperimems have been compared with 
data obtained previousl>rusii^^ 

expressed mRfJAs arc represented in the final display. In particular, rare messages 
(which, importanlly, often include regulatory proteins) are jniot easily^^coycrcd. 
usingjdifferential^ispla^'-sy^temsrr^^ as the majority of 

mRNA species exist at levels of less than 0.0(55%^ the' total population (table 1). 
Bertioli -e^ al. {\ 995) examined - the- efficiency^f-EH> templates (heterogeneous 
mRNA populations) for recovering rare messages and were unable to detect mRNA 
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species present at less than 1.2 % of the total mRNA population— equivalent to an 
intermediate or abundant species. Interestingly, when simple model systems (single 
target nly) were used instead of a heterogeneous mRNA population, the same 
pnmers could detect levels of target mRNA down to 10000 x smaller. These results 
are probably best explained by competition for substrates from the many PCR 
products produced in a DD reaction. 

The numbers of differentially expressed mRN As reported in the literature using 
various model systems provides funher evidence that many differentially expressed 
mRNAs arc not recovered. For example, DeRisi et aL (1997) used DNA array 
technology to examine gene expression in yeast following exhaustion of sugar m the 
medium, and found that more than 1700 genes showed a change m expression of at 
least 2-fold. In light of such a finding, it would not be unreasonable to suggest chat 
of the 8000-1 5 000 different mRNA species produced by any given mammalian cell, 
up to 1000 or more may show altered expression following chemical stimulation. 
Whilst this may be an extreme figure, it is known that at least 100 genes are 
activated/upregulated in Jurkat (T-) cells following IL-2 stimulation (UUman et aL 
1990). In addition, Wan et aL (1996) estimated that inierferon-y-stimulated HeLa 
cells differentially express up to 433 genes (assuming 24000 distinct mRNAs 
expressed by the cells). However, there have been few publications documenting 
anywhere near the recovery* of these numbers. For example, in usmg DD to compare 
normal and regenerating mouse liver, Bauer et aL (1993) found only 70 of 38000 
total bands to be different. Of these. SQ% (35 genes) were shown to correspond to 
differentially expressed bands, Chen et aL (1996) reported 10 genes upregulaied m 
female^ rat liver following ethinyl estradiol treatment. McKenzie and Drake (1997) 
identified 14 different gene products whose expression was altered by phorbol 
myristate acetate (PMA, a tumour promoter agent) stimulation of a human 
myeiomonocytic cell line. Kilt>* and Vickers (1997) identified 10 different gene 
products whose expression was upregulated in the peripheral blood leukocytes of 
allergic disease sufferers. Linskens et aL (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucir>- of differentially expressed genes. Using SH 
for example, Cao et aL (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpatnck et aL {\99S) isolated 17 
genes upregulated in rat liver followme treatment with the peroxisome proliferaior, 
clofibrate: Philips et aL (1990) isolated 12 cD.NA clones which were upreimlated m 
highly metastatic mammar>' aaenocircinoma cell lines compared to poorlv meta- 
static ones. Prashar and Weissman (1996) used 3' restriction fragment analysis and 
identified approximately 40 genes showing altered expression withm 4 h of 
activation of Jurkat T-cells. Groenink and Leegwater (1996) analysed 27 gene 
fragments isolated using SSH of delayed early response phase of liver regeneration 
and found only 12 to be upregulated. 



In the laboratory-. S SHjatas-u&e^ n ir^nlnr ^ np -w^fVY-ynT^idarf gf^nrs which appear 
to show altered expression m guinea pig bver following short- term treatment with 
the peroxisome proliferator. 14^,643 J RocJ«n,__S^ 

unpubii^hed"dbTefvatibhs) ."However, these findings have still to be confirmed by 
analysis of the extracted tissue mRNA for differential expression of these sequences. 

" Whilst the latest differential display technologier are purported to include design 
and ex.£erimental modifications t_p overcome ihisia^ oLefficiency (m both the total 
number of differentially expressed genes recovered and the percentage that are true 
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positives), It is still not clear if such adaptations are practically effective— proving 
efficiency by spiking with a known amount of limited numbers of anmcial 
construct(s) is one thing, but isolating a high percentage of the rare messages alreadv 
present in an mRNA population is another. Of course, some models will genuinely 
produce only a small number of differentially expressed genes. In addition, there arc 
also technical problems that can reduce efficiency. For example, mRNAs may have 
an unusual primary structure that effectively prevents their amplification by PCR- 
based systems. In addition, it is known that under ccnain circumstances not all 
mRNAs have 3' polyA sites. For example, during Xenopus development, dcadenyl- 
ation is used as a means to stabilize RNAs (Voeltz and Stcitz 1998), whilst 
preferential deadenylation may play a role in regulating HspTO (and perhaps, 
therefore, other stress protein) expression in Drosophila (Dcllavalle ct al. 1 994). The 
presence of deadenylated mRNAs would ciearlv reduce the emciencv of svstcms 
utilizing a polydT reverse transcription step. The efficiency of any system also 
depends on the quality of the starting material. All differential display techniques 
use mRNA as their target material. However, it is difficult to isolate mRNA that is 
completely free of hbosomal RNA. Even if polydT primers are used to prime first 
strand cDNA synthesis, ribosomal RNA is often transcribed to some degree 
(Clontech PCR-Select cDNA Subtraction kit user manual). It has been shown, at 
least in the case of SSH, that a high rRNA:mRNA ratio can lead to inefficient 
subtractive hybridization (Clontech PCR-Select cDNA Subtraction kit user 
manual), and there is no reason to suppose that it will not do likewise in other SH 
approaches. Finally, those techniques that utilise a presubtraction amplification step 
(e.g. RDA) may present a skewed representation since some sequences amplify 
better than others. 

Of course, probably the most imponant consideration is the temporal factor. It 
is clear that any given differential display experiment can only interrogate a ceil at 
one point in time. It may well be that a high percentage of the genes showing altered 
expression at that time are obtained. However, given that disease processes and 
responses to environmental stimuli involve dynamic cascades of signalling, 
regulation, production and action, it is clear that all those genes which are switched 
on/off at different times will not be recovered and, therefore, vital information may 
well be missed. It is, therefore, imperative to obtam as much information about the 
model system beforehand as possible, from which a scrate^- can be denved tor 
targeting specinc time points or events that are of pamcuiar interest to the 
investigator. One way of getting round this problem or smgie time point analysis ts 
to conduct the experiment over a suitable time course which, of course, adds 
substantially to the amount of work involved. 



Hotv sertsitive are differential expression technologies ? 

There has been little published data that addresses the issue_of^4v0w-tergenhc 
changejiLjacpx«ssioBHmm-bc^^ of the gene m question with 

the various differential expression technologies. Although the isolation of genes 
whose expression is changed as Jittie^as -4^ l^ld-has~been~Teponed using SSH 
(^TOeninlrand'Leegwater-1996)r it appearT^rfiaTtliose demonstrating a change in 
excess _of 5-fold.aje_rnQie_iikdy rn iv^p}riied-up.^hus, there is a 'grey zone* 
in between where small changes could fade in and out of isolation between 
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experiments and animals. DD^^ on the other hand, is not subject to this grey 
zone since, unlike SH approaches, it does not amplify the difference in expression 
between two samples. Wan et al, (1996) reported that differences in expression of 
twofold or more are detectable using DD. 



Resolution and visualization oj differential expression products 

It seems highly improbable with current technolog>' that a gel system could be 
developed that is able to resolve all gene species showing altered expression in any 
given test system (be it SH- or DD-based). Polyacr>iamide gel electrophoresis 
(PAGE) can resolve size differences down to 0.2 (Sambrook et at, 1989) and arc 
used as standard in DD experiments. Even so, it is clear that a complex series of gene 
products such as those seen in a DD will contain unresolvable components. Thus, 
what appears to be one band in a gel may in fact rum out to be several. Indeed, it has 
been well documented (Mathieu-Daude et al. 1996, Smith et al. 1997) that a single 
band extracted from a DD often represents a composite of heterogeneous products, 
and the same has been found for SSH displays in this laboratory (Rockett et aL 
1997). One possible solution was offered by Mathieu-Daude et aL (1996), who 
extracted and reamplified candidate bands from a DD display and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often try to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunately, 
high resolution agarose gels such as Metaphor (FMC, Lichfield. UK) and AquaPor 
HR (National Diagnostics, Hessle, UK), whilst easier to prepare and manipulate 
than PAGE, can only separate DNA sequences which differ in sire by around 
1.5-2 °'o (15-20 base pairs for a 1Kb fragment). Thus. SSH, RDA or other such 
products which differ m size by less than this amount are normally not resolvable. 
However, a simple technique does in fact exist tor increasing the resolving power of 
AGE — the inclusion of HA-red (10-phenyl neutral red-PEG ligand) or HA-yellow 
(bisbcnzamide-PEG ligand) (Hanse Anahtik GmbH, Bremen, Germany) in a 
gel separates identical or closely sized products on base content. Specifically, 
HA-red and -yellow selectively bind to GC and AT DNA motifs, respectively 
(Wawer et aL 1995. Hanse .-^naiytik 1997. personal commumcationj. Since both 
HA-stams possess an overall positive cnargc. they migrate towards the catnoae 
when an electric field is applied. This is in direct opposition to DNA. which 
IS negatively charged and, therefore, migrates towards the anode. Thus, if two 
DNA clones are identical m size (as perceived on a standard high resolution 
agarose gel), but differ in AT/GC content, inclusion of a HA-dye m the gel 
will effectively retard the migration of one of the sequences compared to the 
other, effectively making it apparently larger and, thus, providing a mean^-^fl 



^diffefcntiatmg between the two. The use of HA-red has been shown to resolve 
sequences with an AT variation of less than 1 % (Wawer et al. 1995), whilst Hanse 
Analytik have reported tJiat_HA_staim 



io"distinguish two 567bp sequences which-differed by only a single point mutation 
(Hanse Analytik 1996, personal communicati on). Therefore, if one wishes to check 
whether all the clones produced from a specific band in a differential display 
experiment-«re derived from the-s^me g ene s p e c ies, -a small- amount of reamplified 
or digested clone can be run on a standard high resolution gel, and a second aliquot 



680 



J. C. Roekett et al. 




Figure 10. Discrimination of clones of identical . nearly identical size using HA-red. Bands of decreasing 
size (1-5) were extracted from the final display of a suppression subtractive hybridization 
experiment and cloned. Seven colonies were picked at random from each cloned band and their 
mserts amplified using PGR. Tne aroducts were run on rwo gels. (A) a high resoiunon 2 agarose 
gel. and (6) a high resolution 1 . agarose gel containing \ L'/mi HA-red. With few exceptions, all 
the clones from each band appear to be the same size ^gel A). However, the presence of HA-red 
(gel B). which separates identically-sized DNA fragments based on the percentage of CC within 
the sequence, clearly indicates the presence o: different gene species withm each band. For 
example, even though all five re-amplified clones of band 1 appear to be the same size, at least four 
different gene species are represer.ted. 



in a similar gel containing one of the HA-stains. The standard gel should indicate 
any gross size differences, whilst the HA-siained gel should separate otherwis 
unrcsolvablc species (on standard AGE) according to their base content. Geisinger 
et at, (1997) reported successful use of this approach for identifying DD-denved 
clones. Figure 10 shows such an experiment carried out in this laboratory on clones 
obtained from a band extracted from an SSH display. 

An aJtemanve approach is to carry out a 2-D anaivsis or the dmcrcntial display 
products. In this approach, size-based separation is rirst carnea out m a standard 
agarose gel. The gel slur ::ontaining the display is then extracted and incorporated 
in to a HA gel for resolution based on AT/GC content. 

Of course* one should always consider the possibiliry of there being different 
gene species which are the same size and have the same GC/AT content. However, 
even these species are not unresolvable given some effort — again, one might use 
SSCP, or perhaps a denaturing gradient gel electrophoresis (DGGE) or temperature 
gradient field electrophoresis (TGGE) approach to resolve the contents of a band, 
cither directly on the extracted b and (Suzuki' gf aL 1991) or on_ rhe_j:rampli^cd 

The requirement of some difTercntial display techniques to visualize large 
numbers of productsie^g.^D-D_aridJ3E^H)-carv^so p^re 

of numbers, the resolution of PAGE rarely exceeds TOO— 4-00 bands. One approach to 
overcoming this might be to \ise-'2'- D "geis such as tK?)?c'^escribed by Uitterlindcn tt 
fl/, (1989) and Hatada era/. (1991). , 
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Extraction of differentially expressed bands from a gel can be complex since, in 
some cases (e.g. DD, GEF)» the results are visualized by autoradiographic means, 
such that precise overlay of the developed film on the gel must occur if the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that of the use of radioisotopes, 
has been addressed by several groups. For example, -Lohmann et at. (1995) 
demonstrated that silver staining can be used directly to visualize DD bands in 
horizontal PAGs. An et al. (1996) avoided the use of radioisotopes by transferring a 
small amount (20-30 °o) of the DNA from their DD to a nylon membrane, and 
visualizing the bands using chemiluminescent staining before gomg back to extract 
the remainmg DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was anached to the polydT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA eluted by washing with PCR buffer prior 
to reamplification. 

One of the advantages of using techniques such as SSH and RDA is that the final 
display can be run on an agarose gel and the bands visualized with simple cthidium 
bromide staining. Whilst this approach can provide acceptable results, overstaining 
with SYBR Green I or SYBR Gold nucleic acid stains (FMC) effectively enhances 
the intensity and sharpness of the bands. This greatly aids in their precise extraction 
and often reveals some faint products that may otherwise be overlooked. Whilst 
differential displays stained with SYBR Green I are better visualized using short 
wavelength UV (254 nm) rather than medium wavelength (306 nm), the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to overstam wuh SYBR Green I 
and extract bands under a medium wavelength UV transillumination. 



The possible use of *microfingerprinung* to reduce complexity 

Given the sheer number of gene products and the possible complexity' of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section ot a differcnnal display — a 'sub-tinecrpnnt ' or 'micro- 
nngerpnjit*. In this case, one couid concentrate on those banas wnich oniy appear 
m a particular chosen size region. Reaucmg the nngerpnnt m tnis way nas at least 
two advantages. One is thai n should be possible to use different ge! t>*pes, 
concentrations and run times tailored exactly ro that region. Currently, one might 
run products from 1 00-3000 -r bp on the same gel, which leads to compromize in the 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual' 
bands. Secondly, it may be possible to enhance resolution bv using a ?-P analysis 



using a HA*stain, as described earlier. In summar>'» if a range of gene product siz s 

1 

is carefully chosen to included ceaain * relevant * genes, the 2-D system standardized, 

jyniijtpp>zx>p.Date_gene-anah'^-use^ 



early and rapid identification of compounds which have similar or widely different 
'cellular effects. If the. prognosis for exposure to one or more other chemicals which 
jdisplay a similar_prpfile is alreadv.known^ then one cou ld perhaps predict similar 

effects for any new compounds which show a similar micro- fingerprint. 
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An aliemaiivc approach to micronngerprinting is to examine altered expression 
in specific families of genes through careful selection of PCR pnmers and/ r post- 
reaction analysis. Stress genes, growth factors and/or their receptors, cell cycling 
genes, c\-tochromes P450 and regulator.* proteins might be considered as candidates 
for analysis in this way. Indeed, some off-the-shelf DNA arrays (e.g. Clontech's 
Atlas cDNA Expression Array scries) already anticipated this to some degree by 
grouping together genes involved in different responses e.g. apoptosis. stress, DNA- 
damage response etc. 



Screening 

False positives 

The generation of false positives has been discussed at length amongst the 
differential display community (Liang a/, 1993, 1995, Nishioer a/. 1994, Sun etal. 
1994, Sompayrac et al. 1995). The reason for false positives varies with the 
technique being used. For instance, in RDA, the use of adaptors which have not 
been HPLC purified can lead to the production of false positives through illegitimate 
ligation events (O'Neill and Sinclair 1997), whilst in DD they can arise through 
PCR artifacts and illegitematc transcription of rRNA. In SH, false positives appear 
to be derived largely from abundant gene species, although some may arise from 
cDNA/mRNA species which do not undergo hybridization for technical reasons. 

A quick screening of putative differentially expressed clones can be earned out 
using a simple dot blot approach, in which labelled first strand probes synthesized 
from tester and driver mRNA are hvbridized to an arrav of said clones (Hedrick et 
al. 1984. Sakaguchi et al. 1986). Differentially expressed clones will hybridize to 
tester probe, but not driver. The disadvantage of this approach is that rare species 
may not generate detectable hybridization signals. One option for those using SSH 
is to screen the clones using a labelled probe generated from the subtracted cDNA 
from which it was derived, and with a probe made from the reverse subtraction 
reaction (ClonTechniques 1997a). Since the SSH method enriches rare sequences, 
it should be possible to confirm the presence of clones representmg low abundance 
genes. Despite this quick screening step, there is still the need to eo back to the 
onginal mRNA and confirm the altered expression usme a more quantitative 
approach. Although this may be achieved using Northern biors. the sensmvirv' is 
poor by today's high standards and one must rely on PCR methods for accurate and 
sensitive determinations (see below). 



Sequence analysis 

The majority of differential display procedures produce final products which are 
between 100 and IQQObp-in-^ixe^J^^weverrthrarmay considerably reduce the size of 
the sequence for analysis of the DNA databases. This in turn leads to a reduced 
confidence in the result — several familiesofVgcnes^iay^^ 
^ e qu encea-are-^irnosTtdentica ^ few tey " Mi etches, e.g. the cytochrome 

P450 gene superfamiiy (Nelson et a/._l 996). Thus, does the clone identified as being 
almost identical to gene Xo really come from that gene» or its brother gene or its 
as yet undiscovered sister X^? FoTexample, using SSH, part of a gene was isolated. 
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which was up-regulated in the liver of rats exposed to \Vy- 14,643 and was identiried 
by a FASTA search as being transferrin (data not shown). However, transferrin is 
known to be downregulated by hypolipidemic peroxisome proiifcrators such as \Vy- 
14.643 (Hertz et ai 1996). and this was confirmed with subsequent RT-PCR 
analysis. This suggests that the gene sequence isolated may belong to a gene which 
is closely related to transferrm, but is regulated by a difrerent mechanism. 

A funher problem associated with SH technolog\* is redundancy. In most cases 
before SH is carried out, the cDNA population must first be simplified by restriction 
digestion. This is important for.at least two reasons: 

(1) To reduce complexity — long cDNA fragments may form complex networks 
which prevent the formation of appropriate hybrids, especially at the high 
concentrations required for efficient hybridization. 

(2) Cuning the cDNAs into small fragments provides better representation of 
individual genes- This is because genes derived from related but distinct 
members of gene families often have similar coding sequences that may cross- 
hybridize and be eliminated during the subtraction procedure (Ko 1990). 
Furthermore, different fragments from the same cDNA may differ considerably 
in terms of hybridization and amplification and, thus, may not efficiently do one 
or the other (Wang and Brown 1 99 1). Thus, some fragments from differentially 
expressed cDNAs may be eliminated during subtracrive hybridization pro- 
cedures. However, other fragments may be enriched and isolated. As a 
consequence of this, some genes will be cut one or more times, giving rise to two 
or more fragments of different sizes. If those same genes are differentially 
expressed, then two or more of the different size fragments may come through 
as separate bands on the final differential display, increasing the observed 
redundancy and increasing the number of redundant sequencing reactions. 

Sequence comparisons also throw up another important point — at what degree 
of sequence similarity does one accept a result. Is 90*^0 identuu between a gene 
derived from your model species and another acceptably closer Is 95 °o between 
your sequence and one from the same species also acceptable? This problem is 
particularly relevant when the forward and reverse sequence comparisons give 
similar sequences with completely different gene species! An arbitran- decision 
seems to be to allocate genes mat are derinite (95 °n and above simiianrv-i and then 
group those between oO and 95 as being related or possible homoloeuea. 
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Quantitative analysis 

. At Some point, one must give consideration to the quantitative analysis of the 
candidate genes, either as a means of confirming that they are truly differentially 

expressed, or in order to estab lish just what the difference s_axe^-iSu3j^hefn-bktf 

ialysisi:sra"popular approach as it is relatively easy and quick to perform. However, 
the major drawback with Northern blots is that they are often not sensitive enough 
to detect rare^equ«ices_..Since_thej^ 

abundance (see table 1 ), this is a major problem. Consequently, RT-PCR may be the 
^—method of choicefor confi unLu g"d tffer c n ti «il e xpi e sj^iuu. 'Although the procedure is 
somewhat more complex than Northern analysis, requiring synthesis of primers and 
optimization of reaction conditions for each gene species, it is now possible to set up 
high throughput PCR systems~using mulitchannel pipettes, 96 4- -well plates and 
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appropriate thermal cycling technology. \\'hilst quantitative analysis is more 
desirable, being more accurate and without reliance on an internal standard, the 
money and time needed to develop a competitor molecule is often excessive, 
especially when one might be examining tens or even hundreds of gene species. The 
use of scmi-quantitaiive analysis is simpler, although still relaiiveiy involved. One 
must first of all choose an internal standard that does not change m the test cells 
compared to the controls. Numerous reference genes have been tried in the past, for 
example mterferon-gamma (IFN.7, Fryc et al, 1989), ^-actm (Heuval et aL 1994), 
glyccraldehyde-3-phosphate dehydrogenase (GAPDH, Wong et al. 1994), di- 
hydrofolate reductase (DHFR, Mohler and Butler 1991), /?.2-microglobulin 
m, Murphy et al. 1990), hypoxanthine phosphoribosyl transferase (HPRT, Foss et 
aL 1998) and a number of others (ClonTechniqucs 1997b). Ideally, an miemal 
standard should not change its level of expression m the cell regardless of cell age, 
stage in the cell cycle or through the ctiects of external stimuli. However, it has been 
shown on numerous occasions that the levels of most housekeeping genes currently 
used by the research community do in fact change under certain conditions and in 
different tissues (ClonTechniques 1997b). It is imperative, therefore, that pre- 
iiminar>' experiments be earned out on a panel of housekeeping genes to establish 
their suitabilirv for use in the model svstem. 

Interpretation of quantitative data must also be treated with caution. By 
comparing the lists of genes identified by differential expression one can perhaps 
gain insight into why two different species react in different ways to external stimuli. 
For example, rats and mice appear sensitive to the non-genotoxic effects of a wide 
range of peroxisome proliferators whilst Syrian hamsters and guinea pigs are largely 
resistant (Orton et aL 1984, Rodricks and Tumbull 1987. Lake et aL 1989, 1993, 
Makowska et aL 1992). A simplified approach to resolving the reason(s) why is to 
compare lists of up- and down- regulated genes in order to identify those which are 
expressed in only one species and, through background knowledge of the effects of 
the said gene, might suggest a mechanism of facilitated non-genoioxic carcinogenesis 
or protection. Of course, the situation is likely to be far more complex. Perhaps if 
there were one key gene protecting guinea pig from non-genotoxic effects and it was 
upregulaied 50 times by PPs. the same gene might only be up- regulated five times 
in the rat. However^ since both were noted to be upreeulated. the importance of the 
gene may be overlooked. Just to complicate mancrs. a large cnange m expression 
does not necessarily mean a biologically important change. For example, what is the 
true relevance of gene Y which shows a 50-fold increase after a particular treatment, 
and gene Z which shows only a 5-fold increase ? If one examines the literature one 
may find that histoncally, gene Y has often been shown to be up-regulated 40-60- 
fold by a number of unrelated stimuli — in ligtt"of"this the 50-fold increase would 
appear less significant. However, the literature may show that gene Z has never been 
recorded as having more than doubled in expression — which makes your S-fold 
increase all the more exciting. Perhaps eve n more interesting is if t hatL-same-S^fold^ 

iaM«^Be4ia9-oTil;rbeen"Se«riin^ted neopIasmTor following treatment with related 

chemicals. _ _ . 

-PrdbSms~in usingTh^difi&entiar display approa^ch 

Differential display technology originally held promise of an easily obtainable 
'fingerprint' of those genes which are up- or down-regulated in test animals/cells in 
a developmental process or foUowmg exposure to given stimuli. However, it has 
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become clear that the fingerprinting process, whilst still valid, is much too complex 
to be represented by a single technique profile. This is because all differential display 
techniques have common and/or unique technical problems which preclude the 
isolation and identification of all those genes which show changes m expression. 
Furthermore, there are imponant genetic changes related to disease development 
which differential expression analysis is simply not designed to address. An example 
of this is the presence of small deletions, insertions, or point mutations such as those 
seen in activated oncogenes, tumour suppressor genes and individual poly- 
morphisms. Polymorphic variations, small though they usually are, are often 
regarded as being of paramount importance in explaining why some patients 
respond better than others to certain drug treatments (and, in logical extension, why 
some people are less affected by potentially dangerous xenobiotics/ carcinogens than 
others). The identification of such point mutations and naturally occurring 
polymorphisms requires the subsequent application of sequencing, SSCP, DGGE 
or TGGE to the gene of interest. Furthermore, differential display is not designed 
to address issues such as alternatively spliced gene species* or whether an increased 
abundance of mRNA is a result of increased transcription or increased mRXA 
stabiiitv. 
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Conclusions 

Perhaps the main advantage of open system differential display techniques is that 
they are not limited by extant theories or researcher bias m revealing genes which are 
differentially expressed, since they arc designed to amplify all genes which 
demonstrate altered expression. This means that they are useful for the isolation of 
previously unknown genes which may turn out be useful biomarkers of a particular 
state or condition. At least one open system (SAGE) is also quantitative, thus 
eliminating the need to return to the original mRNA and carry out Northern/ PCR 
analysis to confirm the result. However, the rapid progress of genome mapping 
projects means that over the next 5-10 years or so, the balance of experimental use 
will switch from open to closed differential display systems, particularly DNA 
arrays. Arrays are easier and faster to prepare and use. provide Quantitative data, are 
suitable for high throughput analysis and can be tailored to look at specific signalling 
pathways or families of genes, Idcnnncation of all the gene sequences m human and 
common laboratory animals combined with improved DNA array technology, 
means that it will soon no longer be necessary to try to isolate differentially expressed 
genes using the technically more demanding open system approach. Thus, their 
.jnain advantage (that of identifying unknown genes) will be largely eradicated. It is 
likely, therefore, that their sphere of application will be reduced to analysis of the 



1 easily obtainable 
rst animals/cells in 
i. However, it has 



l^e5S"cominon laborator>' species, since it will be some time yet before the genomes of 
such animals as zebrafish, eleAric eels, gerbils, crayfish and squid, for example, will 

be sequenced. ~ ' — — 

OrcotTrse, in the end the question will always remain: What is the functional/ 
biological significance of the identified, differentially expressed genes? One 
persistent problem is understanding whether differentially expressed genes are a 
cause or consequence of the altered state. Funhcrmorc, many chemicals, such as 
non-gcnotoxic carcinogens, arc also mitogens and so genes associated with 
replication will also be upregulated but may have little or nothing to do with the 
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car::nogenic effect. Whilst differential display technology- cannot hope t answer 
the^e questions, it docs provide a springboard from which identification, regulators 
and functional studies can be launched. Understanding the molecular mechanism of 
cellular responses is almost impossible without knowing the regulation and function 
of thos genes and their condition (e.g. mutated). In an abstract sense, differential 
display can be likened to a still photograph, showing details of a fixed moment in 
time. Consider the Historian who knows the outcome of a bauic and the placement 
and condition of the troops before the battle commenced, bur is asked to tr\' and 
deduce how the battle progressed and why it ended as it did from a few still 
photographs— an impossible task. In order to understand the battle, the Historian 
must find out the capabilities and motivation of the soldiers and their commanding 
officers, what the orders were and whether they were obeyed. He must examine the 
terrain, the remains of the battle and consider the effects the prevailing weather 
conditions exened. Likewise, if mechanistic answers are to be forthcoming, the 
scientist must use differential display m combination with other techniques, such as 
knockout technology*, the analysis of cell signalling pathways, mutation analysis and 
time and dose response analyses. Although this review has emphasized the 
importance of differential gene profiling, it should not be considered in isolation and 
the full impact of this approach will be strengthened if used in combination with 
functional genomics and proteomics {2.dimensionaI protein gels from isoelectric 
focusing and subsequent SDS electrophoresis and vinual 2D-maps using capillary 
electrophoresis). Proteomics is anracting much recent attention as many of the 
changes resulting in differential gene expression do not involve changes in mRNA 
levels, as decribed extensively herein, but rather protein-protem, protein-DNA and 
protein phosphor>'lation events which would require functional genomics or 
proteomic technologies for investigation. 

Despite the limitations of differential display technology, it is clear that many 
potential applications and benefits can be obtained from charactenzmg the genetic 
changes that occur in a cell during normal and disease development and in response 
to chemical or biological insult. In light of functional data, such profiling will 
provide a 'fingerprint' of each stage of development or response, and in the long 
term should help in the elucidation of specific and sensitive biomarkers for different 
types of chemical /biological exposure and disease states. The potential medical and 
thcrapeunc benefits of understanding such molecular changes are almost im- 
measurable. Amongst other thmgs, such fingerprints could mdicate the family or 
even, specific ry-pe of chemical an mdividual has been exposed to plus the length 
and/or acuteness of that exposure, thus mdicating the most prudent treatment. 
They may also help uncover differences m histologically identical cancers, provide 
diagnostic tests for the earliest stages of neoplasia and^ again^ perhaps indicate the 
most efficacious treatment. ~ 

The Human Genome Project will be completed early in the next century and the 
DNA sequence of all the human genes will be known. The continuing dcy elapmcru 
and evolution-ofLdifferem^al-gene-expression technology will ensure that this 
knowledge contributes fully to the understanding of human disease processes. 
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Abstract 

Recent progress in genomics and proteomics technologies has created a unique opportunity to significantly impact 
the pharmaceutical drug development processes. The perception that cells and whole organisms express specific 
inducible responses to stimuh such as drug treatnneni implies that unique expression patterns, molecular fingerprints, 
indicative of a drug's efficacy and potential toxicity are accessible. The integration into state-of-the-art toxicology of 
assays allowing one to profile treatment-related changes in gene expression patterns promises new insights into 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized monitoring of 
drug efficacy and safety in pre-cHnical and clinical studies based on biologically relevant tissue and surrogate markers. 
© 2000 Elsevier Science Ireland Ltd. All rights reserved. 
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Expression profiling in toxicology 

limitations 



1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drus, a characteristic eene resula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxiciiv. 

Gene expression is a multistep . process that 
results in an active protein (Fig. 1). There exist 
numerous regulation svstems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further _down-stream, xreating-a- 
snapshot of gene regulation closer to ultimate cell 
function control. 
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2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential. 
On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PCR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et al.. 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al.. 
1991; Chee et al.. 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



3. Global protein profiling 

Global quantitative expression analysis at the 
protein level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et al.. 1991). 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, silver or fluorescent staining (Fig. 2). 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 
(Mann et aL. 1993) and sequence tags (Wilkins et 
al.. 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related changes. 

4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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Fig. 2. Computerized representation of a Coomassie Blue stained two-dimensiona) gel electrophoresis pattern of Fischer F344 rat 
liver homogenate. 



quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities andndifferences between me molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
~ refocus or extend' th¥ therapeutic spectrum of a 
drue candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such-genes -may be- prefer-- - 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proicins; however, the 
quality of mRNA isolated from body fluids is 
often poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not ver\ meaningfur, and secreted proteins 
are likely mc:^ reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer. 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 

6. Expression profiling and drug development 

Understandine the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al., 1993; Steiner et ai,. 1996b; Aicher et al.. 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al., 199L 
1995. 1996; Steiner et al. 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
c ompound scr ee ning, e nabling on e to s e l e ct drug — 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmaeological -potency and low toxicity 
(Arce et al, 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
pre-clinical and clinical studies (Dohertv 
et al., 1998). 



7. Perspectives 

The basic methodolosv of safetv evaluation has 
changed little during the past decades. To.\icity in 
laboratory animals has been evaluated primarily 
by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunit\ 
to dramatically improve the predictive power o; 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate markers correlated with treatment efficacv 
and safety bears a great potential to optimize th^* 
monitoring of pre-clinical and clinical trails. 
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The availability of genome-scale DNA sequence information and reagents has radically altered life-science 
research. This revolution has led to the development of a new scientific subdtscipline derived from a combina- 
tion of the fields of toxicology and genomics. This subdiscipline, termed toxicogenomics, is concerned with the 
identification of potential human and environmental toxicants; and their putative mechanisms of aaion, through 
the use of genomics resources. One such resource is DNA microarrays or "chips." which allow the monitoring of 
the expression levels of thousands of genes simultaneously. Here we pcopose a general method by which gene 
expression, as measured by cDNA microarrays, can be used as a highly sensitive and informative marker for 
toxicity. Our purpose is to acquaint the reader with the development and current state of microarray technol- 
ogy and to present our view of the usefulness of microarrays to the field of toxicology. Mol. Carcinog. 24:153- 

159, 1999. O 1999 Wiley-Liss, Inc. 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism. 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [41. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the ne w genomics resources. Ne vveninethods — 
"such as OiTferential display [6]; high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucieotide-based rnicroarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicir>\ as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the charaaeristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology- testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 

cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port, in the cDNA approach, cDNA tor genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 



-su^poff-br-crstrrg' high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14], Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
sCTiption reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and CyS-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe," are then 
mixed and hybridized to the array under a glass cov- 
erslip (10,11,15]. The fluorescent signal is deteaed 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
forfluor excitation [10,11,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the abilit\' to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick BrouTi and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cmnsiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22], This method is useful 
for expression profiling and large-scale screening and 
mapping of genoniic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioaaive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
(25-27]. 

^tigomjcleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as weU as gene-expression analysis. 

Fabrication of ohgonucleotide chips by photoli- 
thography is theoreucally simple but technically 
complex [29,30], The light from a high-intensit\^ 
mercur}* lamp is directed through a photoUtho- 
graphic mask onto the sihca surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reaaed with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(Where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexirt- of the 
photolithographic mask and the chip size [29,31.32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 
with fluor-Unked streptavidin (e.g., phycoerythrin) 
after hybridization [12»33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCAl gene [37]. In addition, 
mutations in the cystic fibrosis [38] and BRCAl [39] 
gene products and polymorphisms in the human im- 
munodeficienc}' virus-1 clade B protease gene (40] 
have been detected by this method. Ohgonucleotide 
chips are als*.- useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain 5. cerevisiae [12]. 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human |41J and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Action 

The field of toxicolog\* uses numerous in vivo 
model systems, includmg the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
^w^ue s hav e be e n d c v c lo p e d-to-measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian, hamster embryo-cell- transformation as- 
say, micronucleus assavs, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and manv others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive, charaaeristic, and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
desCTibed above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
hned model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycycUc aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell sumval), 
RNA is harvested, and toxicant-induced gene expres* 
sion changes are assessed by hybridization to a cDNA 
miCToarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on reia- 

Control 
Population 



tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals aaoss the sample 
set. A different signature may be established for each 
prototv-pic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of aaion to the test 
compound. Figure 2 illustrates this signature method 
for different t>'pes of oxidant stressors, PAHs, and 
peroxisome proliferators. in this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both P.AH-Uke and oxidant-like proper- 
ties). In the future, it may be necessary to distmguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide librar>*). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2. w^e developed 
the custom cDN'A microarray chip ToxChip vl.O. 

Treated 
Population 



Cy3 V;: 



RNA Isolation | 



Reverse 
Transcription 








Mix cDNAs and 
Apply to Array 




DNA "Chip' 



Hybridize Under 
Coverslip 




Figure 1. Simplified overview of the method for sample 
preparation and hybridization to cDNA microarrays. For illus- 



trative purposes, samples derived from ceil culture are depined, 
although other sample types are amenable to this analysts. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method, 
gene-expression data derived from exposure of model sys* 
terns to known toxicants are analyzed, and a set of changes 
charaaeristic to that type of toxicant (termed the toxicant 
signature) is identified. As depiaed, oxidant stressors produce 

The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin*like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cycUns, 
kinases, phosphatases, cell adhesion and motility' 
genes, and homeobox genes. Also included in this 
group are 84 h^ -usekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping hst will be 
— fevtsed-tf-Ttew-data-waTranrth^"addition or aeietion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip v LO. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
saeened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



consistent changes in group A genes (indicated by red and 
green circles), but not group 6 or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of action is assigned to 
the unlcnown agent. 

are displayed [11], This facilitates rapid, visual in- 
terpretation of data. We are also deveiopmg To.\- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopiis, and yeast, for use in 
toxicolog\' studies. 

Anima! Models in Toxicology Testing 

The toxicology community relies heavily on the 
use of animals as model systems for toxicolog\' test- 
inc Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicolog\' Program, and the 
toxicologv' community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been _m^dp in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity', immunotox]c:ry, reproductive and developmen- 
tal toxicoi6g>^ and genetic toxicolog\'. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 



Table 1. ToxChip vl.O: A Human cDNA Microarray 
Chip Designed to Detect Responses to Toxic Insult 
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Gene cateqo 




No. of genes 
on chin 



Apoptosis 
DNA replication and repair 
Oxidative stress/redox homeostasis 
Peroxisome prollferator responsive 
Dioxin/PAH responsive 
Estrogen responsive 
Houseiceeping 

Oncogenes and tumor suppressor genes 
Cell-cycle control 
Transcription faaors 
Kinases 
Phosphatases 
Heat-shock proteins 
Receptors . 
Cytochrome P450s 



72 

99 

90 

22 

12 

63 

84 

76 

51 

131 
276 

88 

23 
349 

30 



'This list IS intended as a general guide. The gene cateaories are not 
unique, and some genes are listed m multiple cateoones 

agent is (or is not) responsible for eliciting a civen 
biological response. This information would help to 
select a bioassay more specifically suited to the aeent 
in question or perhaps suggest that a bioassav is not 
necessar)', which would dramaticallv reduce cost 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassav and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
ma s. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
mvestigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relat'ionship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 

assay that inrnmar,*^. — ' 



as s ay that incor por at e s g e iie-e xp ression signatures 
with traditional endpoints might be substantially 

^JSfn." n u '"'"^ regimens, and cost 
substantially less than the current-assays do - 

These considerations are also relevant for branches 

of toxicology not related to human health and not 

using rodents as model systems, such as aquatic toxi- 

co ogy and plant pathology. Bioassays based on the 

flathead minnow, Daphnia, and Arabadopsis could 
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st°T?.'!".T'''^^**'"'**'*°"of™croarravanal^ 

Soa«avSirr'7T''"'"°^"^>'' tradiS 
b oassays might also be useful for investigating some 

sea ?h '""T'^'' P™'''^'"^ toLolo^^™! 

tSf^^.i? °' "'"P'^^ mixturef and 

the difficulties m cross-species extrapolation. 

noII?«?T"^ "'^'^ assessment of ex- 

posure to chemical toxicants are based on measure- 

S toxic n"r'r L" ''^''^ °' °" '""•^"s 
leveh o hlS (« S- peripheral blood 

levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint. gene expres 
sion as measured with microarray technolog^ may 
tifv h^, new biomarker to more precisely iW 
nf> hazards and to assess exposure. Similarly 

moZ""r ""'^ ^" environment : 

inonitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fash on 
microarrays could be used to measurc%enc cxprcs: 
s^n endpoints in subjects in clinical trials. The com- 
e tabSEed tn^"' gene-expression data and more 
„fp? ' h1 "^u' ^"'^P"'"^^ in these trials could be 
used to define highly precise surrogates of safety 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profile 'of the 
same individuals before exposure. From this Sfor 
mation, the nature of the toxic exposure can be de- 

Intr^t^ f''"'"' "^^^ estimated 

In he future it may also be possible to estimate not 
only the nature but the dose of the toxicant fo" a 

1 veh ThT^n °" "'^^'^^ Sene-expressL 

levels. This general approach may be particularly 

appropnate for occupational-health appl.cat S^is in 

which unexposed and exposed samples from the 

ame individuals may be obtainable. For example 

pilot study of gene expression in peripheral-blod 

yrnphocvtes of Polish coke-oven workers exposed 

to PAHs (and many other compounds) is under con 

s.derat,on at the NIEHS. An important cons.deraZ 

h JI . - ""n^erous faaors, including diet 
health, and personal .habits. To reduce the effect^ 
of these confounding factors, it may be necessarv 

o compare pools of control samples with pools of 
treated samples. In the future it mav be possible to 

cotnpare exposed s a mp l e sets to a national da t abase 

of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 

individual. Efforts to develop suctTa national gene - 
expression database are currently under way [44 451 

However, this national database approach wili ri 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effeas of environmental factors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the abihty of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these anays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach (41.42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 

There are manv issues that must be addressed be- 
fore the full potential of microarrays in toxicolog}' 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be ericouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database (44,45]. In 
addition, improved statistical methods for gene clus- 
ieriiig-and--pattern-^ecogmtioinrare~n(eedea to ana- 
iyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarrav 
methods available will create problems of data com- 
patibility between platforms. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data enrr\' mto the national database and serve 
as reference pomts for cross-platform and inter-labo- 
rator\' data analvsis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridization will have a dramatic impaa on toxicol- 
ogy research. In the future, the information gathered 
from miCToarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impaa of chemicals on human and environmental 
health. 
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Subject: RE: fFwd: Toxicolog} Chip] 
Date: Mon. 3 Jul 2000 08:09:45 ^0 
From: "Afshari.Cx-nthia" <:afshari(i'nichs.nih.gov> 
To: "'Diana Hamlci-Cox"* <dianahc(5'tncyic.com> 

Vou car. s e the list of clones thai we have or. cur 12:-: rr.ir at 

v; select d a suoset of genes (2000K) that we relieved critical tr tc:-; 
response and basic cellular processes and added a set cf clcnes and ZzIb z: 
this. iVe have included a set cf control genes (8D-) that were selertet ry 
the ICHGP.I because they die net change across a large sez cf array 
experin»ents . However, we have found that some cf these genes cnknze 
signficantly after tox treatments and are m the process cf lochi-r a: ir.e 
variation cf each of these 80* genes across cur experiments. 
Our chips are constantly chancing and being updated and we hope that cur 
data will lead us to what the toxchip should really be. 
I hope this answers your question. 
Cindy Afshari 

> ------ 

> From: Diana Hamlez-Cox 

> Sent: Monday. June 26. 2000 8:52 

> To: afshari&niehs.nzh^gov 

> Sub jeez : [Fwd : Toxi col ogy Chi p j 
> 

> Dear Dr. Afsharz. 
> 

> Since I have noz yez had a response from Sill Grigg. perhaps he was no: 

> the right person to contacr. 
> 

• > Can yov help me in this matzer? I don't need to know the sequences . 

> necessarily, buz I would like very much zo know whaz zypes of sequences 

> are being used, e.g., GPCRs (more specific?) , ion channels, etc. 
> 

> Diana Hajr.Iet-Cox 
> 

> Original Message 

> 5ur>ject; Toxicology Chip 

> Daze: Hon. 19 Jun 2000 18:31:48 -0700 

> From: Diana Hamlet-Cox <dianahc&incyze . com> 

> Organization: Zncyze Pharmaceuticals 

> To: grigg&niehs .nih.gov 
> 

> Dear Colleague : 
> 

> I am doing lizerature research on zhe use of expressed genes as 

> pharmacozoxi oology markers, and found zhe Press Release dazed February 

> 29. 2000 regarding zhe work of the NIZHS m :his area. Z would like zo 

> know if there is a resource I can access lor you could provide? ) znaz 

> would give me a lisz of the 12.000 genes zhat are on your Humar. ZoxChip 

> Microarray_. Jn partxcula^ — ^— ajn-^irnc^ rested m zhe crizeria usee, zo 

> select seguences for the ToxChip. including any conzrol sequences 

> included in the m.icroarray, 
> 

> Thank you for your assistance in this requesz. 
> 

> Diana Hamlet-Cox, Ph.D. 

> Zncyze Genomics. Inc. 
> 

> — 
> 
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> This e-nai. messag for zhe sole use of zhe i-re.tdef rer.p^e-- « 

> nay conza^- cczfidezzial &nz pz:.vLlegez ir.fczrjizzor. S'^jDrecVzT ' 

> cisrriburic- -s proJiir^iec. If you are .•ro: tije i.tre.'idef rerlpie.-::. 

> p.ease ro.-rrar: :i:e ser.der by reply er*&il ar.z daszroy rrp;s5 cf: 

> original message. 
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